Tip

High-availability solutions in nonvirtualised environments

High-availability solutions have long provided data centres the redundancy needed to circumvent operating system snafus, server faults and other infrastructure problems. Now server virtualisation has raised the stakes,

    Requires Free Membership to View

allowing numerous virtual machines (VMs) to operate in isolated instances on the same physical server.

Although virtualisation supports enormous consolidation in data centres, it also increases the risk for organisations. Companies that deploy server virtualisation need to re-assess their high-availability solutions, drawing on traditional network architecture while exploiting the flexibility and features made possible by virtualisation platforms.

The basics of high-availability solutions

To understand the changing opportunities of high availability (HA) in a virtualised server environment, it's important to appreciate the characteristics and tradeoffs of traditional high-availability solutions in a nonvirtualised setting.

In their most basic form, high-availability solutions provide redundancy while eliminating single points of failure. An HA installation, for example, may include two or more identical servers interconnected to two independent Ethernet network switches. These servers, in turn, may be interconnected to two independent Fibre Channel storage area network (SAN) switches that are interconnected to two redundant storage devices.

Each set of devices is ideally powered by different electrical distribution circuits supported with independent uninterruptible power supply systems. Redundant servers are also connected to one another through redundant health monitoring -- or "heartbeat" -- signals.

The servers themselves typically incorporate their own resilient design characteristics, including multiple-processor cores, extensive memory, redundant power supplies and network/storage connectivity.

How high-availability solutions work

Each server hosts duplicate operating systems, applications and HA failover software, such as platform-specific products that include Windows Server 2003 Cluster Server, the Solaris Cluster from Sun Microsystems Inc. (now owned by Oracle Corp.) and IBM PowerHA for AIX (HACMP), or may use cross-platform offerings such as Symantec Corp.'s Veritas Cluster Server or IBM Tivoli System Automation

When disruptions in a heartbeat indicate problems within a server, the failover software switches automatically to an alternate server where SAN and LAN access continues with little -- if any --interruption. Redundant-LAN switching, SAN switching and duplicated SAN storage ensure that any disruption outside servers can find an alternate path for continued operation.

Once the trouble is isolated and repaired, the system "fails back" to its original configuration. This is often called an "active/passive" configuration. In "active/active" setups, the second server operates in tandem with the first -- rather than as a spare -- to provide greater processing power for a workload. But when the companion server fails, one server can continue operation at a diminished level. With countless variations to this basic approach, the cluster itself can potentially include three, four or more servers.

The cost of nonvirtualised high-availability solutions

Of course, the traditional, nonvirtualised approach to high availability also carries a high price tag. Redundant servers, LAN networking, SAN networking and storage, and OS/application software licensing dramatically bump up the cost of high-availability solutions within enterprises.

Defining availability requirements is the first order of business, according to Dave Sobel, the CEO of Evolve Technologies in Fairfax, Va. Sobel said that management needs to define uptime -- that is, the number of "nines" -- in relation to business needs and budget. This table shows what the number of nines can mean for an organisation's downtime over the course of a week, month and year:

Rated availability

Annual downtime

Monthly downtime

Weekly downtime

90.0%

876 hours

36.5days

73 hours

3 days

16.8 hours

0.7 day

92.0%

700.8 hours

29.2 days

58.4hours

2.4 days

13.5 hours

0.6 day

95.0%

438 hours

18.3 days

36.5 hours

1.5 days

8.4 hours

0.35 day

98.0%

175.2 hours

7.3 days

14.6 hours

0.61 day

3.4 hours

0.14 day

99.0%

87.6 hours

3.7 days

7.3 hours

0.3 day

1.7 hours

0.071 day

99.5%

43.8 hours

1.83 days

3.7 hours

0.15 day

0.84 hour

50.5 mins

99.8%

17.5 hours

0.73 days

1.46 hours

87.6 mins

0.34 hour

20.4 mins

99.9% (three 9s)

8.8 hours

0.37 days

0.73 hours

43.8 mins

0.17 hour

10.2 mins

99.95%

4.4 hours

0.18 days

0.37 hours

22.0 mins

0.085 hour

5.1 mins

99.99% (four 9s)

0.88 hours

52.8 mins

0.073 hours

4.4 mins

0.017 hour

1.0 min

99.999% (five 9s)

0.088 hours

5.3m mins

0.0073 hours

26.4 secs

0.0017 hour

6.1 secs

99.9999% (six 9s)

0.0088 hours

31.6 secs

negligible

2.63 secs

negligible

0.61 sec

These calculations are approximations based on 8,760 hours in a year. Calculated downtime figures suggest "unplanned" downtime. All systems plan for regular downtime, which is not figured into the table.

Business requirements must drive the technology, but the expense of high-availability solutions limits the number of applications that can be protected affordably. As a consequence, only a few critical applications receive HA protection, while other applications are relegated to periodic snapshots or backups.

About the Author
Stephen J. Bigelow, a senior technology writer at TechTarget, has more than 15 years of technical writing experience in the technology industry. He has written hundreds of articles and more than 15 feature books on computer troubleshooting, including Bigelow's PC Hardware Desk Reference and Bigelow's PC Hardware Annoyances. Contact him at sbigelow@techtarget.com.


This was first published in March 2010

Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.