The other day, a hardware failure brought down our Exchange server. This failure created a panic
in our user community because we consider email availability as important as a dial tone. Had we
been using a Windows NT cluster, we users would never have noticed the problem. By providing
continuous availability through replication, an NT cluster could have saved us a lot of frustration
and prevented the loss in productivity.
Today's NT clustering solutions solve one business computing problem: availability. By
replicating data, applications, and even entire systems, clustering lets two or more systems watch
each other's back and take over the workload (user connections, applications, and services) in case
one system fails. This article will review the types of clustering solutions currently available,
categorize clustering solutions, and illustrate what types of business computing problems clustering
can help solve now.
So What's a Cluster Anyway?
A cluster is a group of whole, standard computers that work together as a unified
computing resource and that can create the illusion of being one machine, a single system image.
(With NT clusters, the term whole computer, which is synonymous with node, means a
system that can run on its own, apart from the cluster. If you're not familiar with clustering
terms, you can refer to "Clustering Terms and Technologies.") This unified
computing resource ensures availability because any node can take on the workload of any other node
that happens to fail.
Clusters come in three configuration types: active/active, active/standby, and fault tolerant.
Let's examine each of the three types of cluster configurations:
- Active/active: All nodes in the cluster perform meaningful work. If any node fails,
the remaining node (or nodes) continues handling its workload and takes on the workload from the
failed node. Failover time is between 15 seconds and 90 seconds.
- Active/standby: One node (the primary node) performs work, and the other (the standby,
or secondary node) stands by waiting for a failure in the primary node. If the primary node fails,
the clustering solution transfers the primary node's workload to the standby node and terminates any
users or workload on the standby node. Failover time is between 15 seconds and 90 seconds.
- Fault tolerant: A fault-tolerant cluster is a completely redundant system (disk and CPU) whose
goal is to be available 99.999 percent of the time. That goal translates to fewer than 6 minutes of
downtime per year. Both nodes of a fault-tolerant cluster simultaneously perform identical tasks;
the nodes' workloads are redundant. Failover time is less than 1 second.
To illustrate the definition of a cluster, let's say you have users doing file and print on
Server A and another group of users accessing an Oracle database on Server B. Servers A and B are
nodes in an active/active cluster. If Server A fails, Server B continues handling its workload and
picks up Server A's workload. The users accessing the Oracle database do not notice any change in
their service; the users doing file and print at most experience a short delay.
NT Clustering Solutions
As the need for availability becomes ever more crucial in the NT environment, many third-party
vendors and Microsoft have introduced or are about to introduce clustering solutions for NT. To help
you evaluate these clustering solutions, let me briefly explain Microsoft's clustering initiative,
Wolfpack, and categorize its capabilities in comparison with those of some prominent third-party
clustering solutions. (For reviews of several individual clustering products, including Wolfpack,
see Lab Reports.)
Wolfpack
Wolfpack is Microsoft's two-node, active/active clustering solution and set of APIs for NT.
Wolfpack's purpose is to provide high availability to your NT Server environment.
Wolfpack will have an effect in several significant areas. First, you can expect all server
manufacturers who want to reach NT customers to offer Wolfpack-based clustering support this year.
Even a year before its release, Wolfpack had the backing of Digital Equipment, Compaq Computer,
Tandem, Intel, Hewlett-Packard, NCR, and IBM.
Theoretically, Wolfpack will work on any two Intel-based or any two Alpha-based servers, but
you can't mix Intel and Alpha. However, in practical terms, the number of supported systems will be
very restricted because to get on the Wolfpack Hardware Compatibility List (WHCL), each manufacturer
must test complete configurations (system, disk subsystem, and SCSI adapter) for compatibility. This
approach stands in contrast to NT's existing Hardware Compatibility List (HCL), which lets
manufacturers list individual system components. For the WHCL's first release, Microsoft will let
each manufacturer list only two configurations. Microsoft will support Wolfpack only for systems on
the WHCL, so don't try to build your own Wolfpack clustering solution. Although these requirements
will initially limit the selection of Wolfpack-compliant configurations, the WHCL will grow over
time.
Anonymous User November 03, 2004 (Article Rating: