[ Go to May 1997 Table of Contents ]|
-- by Art Brieva
Microsoft is at it again. This time, the company's Windows NT developers are preparing new clustering software-code-named Wolfpack-with an eye to improving NT's reliability and scalability and maybe devouring the high-end UNIX market at the same time.
Wolfpack is a set of software programs and APIs that will let you cluster or interlink two NT servers; if the first server fails, users and applications are seamlessly switched over-or failed over-to the secondary server. You can also manage applications on clustered servers. Sometime in 1998, Wolfpack is expected to add support for application load-balancing and for larger clusters that can link four or more NT servers. (See sidebar "A Crash Course in Clustering.") If it delivers on its promise, Wolfpack could be an ideal technology for large businesses seeking to move UNIX and minicomputer applications on to less expensive PC servers running
Wolfpack is one of several Windows NT clustering technologies currently available or under development. (See sidebar "Wolfpack Alternatives.") Other server operating systems have long supported some form of clustering capability. For instance, Digital Equipment Corp. has offered clustering technology for VMS since 1982. And Novell introduced System Fault Tolerance (SFT) for NetWare in the mid-1980s. But while Novell's approach requires two NetWare servers configured identically in almost every way, Wolfpack will let you cluster disparate NT servers from multiple vendors. At least, that's what Microsoft claims.
Wolfpack is just beginning its beta cycle, and only a handful of large customers have taken it out for a trot. NT Enterprise Edition tested a fully configured Wolfpack cluster for this article.
We tested Wolfpack on two Tandem S100 series servers. Each was configured with a single 200MHz Pentium Pro processor and 32MB of RAM. Each server had two SCSI controllers: one embedded on the system board and the other plugged into a PCI slot. Wolfpack requires PCI SCSI controllers for the shared disk subsystem. The Tandem S100's embedded Adaptec AIC-7880 PCI-to-Ultra Wide SCSI controller supports the server's two internal disk drives and SCSI CD-ROM, while the Adaptec AHA-2940UW PCI-to-Ultra Wide SCSI host adapter supports the shared SCSI disks.
During setup, you should disable "Boot Time SCSI Reset" on the controller connected to the shared disks. If you don't do this, when one downed server in an NT cluster is brought back up, the boot time reset could disrupt services running on the other clustered server.
SCSI controller IDs are another setup concern. Each controller that connects to the shared bus requires a different SCSI ID. For example, the default ID for most SCSI controllers is 7. With Wolfpack, you'd simply change the ID on the shared bus controller to 6 on one of the servers. You usually do this from BIOS setup when the server boots.
The shared disk subsystem we used for our tests was a tower setup that had four Seagate Hawk 2XL Ultra SCSI disk drives. The substorage system had redundant power supplies, power cords for each power supply and redundant fans.
On the shared bus, only disk drives can be part of the SCSI chain, and those disks must be formatted with NTFS. Microsoft strongly recommends that each disk have only one partition because logical partitions cannot be independently failed over. Our clusters' shared disks were configured as four disk drives without any additional redundancy. In a real-world configuration, disk mirroring or a RAID 5 array would improve the cluster's reliability. Fortunately, all of Windows NT's disk fault-tolerance features carry over to Wolfpack.
The cluster we received for our tests was fully prepared, but we approached the installation as if starting from scratch in order to simulate the experience a typical network administrator might have with Wolfpack. And, although we worked with pre-beta software, we expect most of the details shared here will apply to the final Wolfpack release.
When preparing for a Wolfpack installation, carefully consider drive letter assignments. Shared disks require the same drive letter on both server nodes. During Windows NT setup, it's important to assign drive letters to all unshared disk drives. To make sure you do this correctly, install NT on each node without connecting the cable to the disk subsystem. Then use the Disk Administrator on each node to assign drive letters and format other unshared disks for each node. In other words, it's a good idea to have the same number of unshared disks in each node.
Next, shut down both servers and connect the cables to the disk subsystem. Turn on the disk subsystem and boot only one of the servers (it doesn't matter which one). Wolfpack installs with a Setup Wizard and copies the Wolfpack system files to the Windows NT local disk drive. Then the Setup Wizard prompts you to create or join an existing cluster and choose the SCSI attached disks to use as the shared disk.
You'll have to tell Wolfpack whether each network adapter is used to communicate with the cluster, clients or both. You'll also need to enter the IP address for the cluster.
Installing the second node in the cluster is simple, as the configuration from the first node is replicated to the second. To complete the installation routine, you run Wolfpack Setup and select Join Cluster, followed by the name.
Wolfpack uses resources and groups to manage applications that could potentially fail. A resource is either physical or logical and provides a service to clients that use the cluster. A resource is "owned" by only one node at a time and must be available on the node that hosts it. Resources are distributed between the nodes to load balance the cluster.
The basic resources under Wolfpack are NT Services, SCSI-attached disks, TCP/IP address network names, file shares, print spoolers, Internet Information Server (IIS) Virtual Roots and time services. Additional resources can be added to Wolfpack in the form of a "resource DLL." Software and hardware vendors will be able to create their own resource DLLs so that Wolfpack can manage additional resources. Also, a cluster API will let vendors develop "richer cluster-aware fail-over and recovery options," according to Microsoft. For instance, Microsoft plans to use the APIs to make SQL Server and other applications cluster-aware.
Every resource is placed in a group-even if it's the only resource in the group. One resource may have dependencies on other resources, meaning it may rely on services from other resources. For example, you can create a group containing a file share resource that's dependent on a disk drive. Using this example, assume a person using the share is connected to a node that has an IP address. Now, the file share has dependencies on three resources: the physical disk drive in the quorum (one or more common SCSI disk arrays shared by the clustered servers), the node name and the IP address.
If dependencies are not accounted for, Wolfpack can't properly maintain resource availability. Naturally, the set-up and maintenance of resources can get complicated.
Groups and resources have several operational states. A group or resource can be online, online-pending or offline, and a group can be partially online. If a resource is online, it is working properly on the alternative node. If it's offline, the resource is unavailable because the services haven't launched yet.
We found that offline resources can be brought online manually by right-clicking on the object and choosing Bring Online. If the resource is online-pending, more than likely, it is waiting for a dependent resource to come up first. When groups are partially online, only selected resources within the group are completely running.
When groups and resources are configured correctly, they can be transferred from node to node with no impact on networked clients. You could, for instance, shut down server A for a memory upgrade and maintain network access by failing users over to server B.
Wolfpack includes a configuration tool called Cluster Administrator. As you might expect, you can launch this GUI-driven tool from NT's Administrative Tools group. Navigating Cluster Administrator is similar to working with Windows NT Explorer. All objects are contained within a few folders. Click on a folder in the left pane, and you'll view its contents in the right pane.
Cluster Administrator has three views that simplify administration considerably. The first offers a glimpse at every group and resource within the cluster. You can also view each node, as well as groups and resources within a node. Finally, groups and resources are broken down into two folders so you can see which server controls which resource.
You can accomplish every administrative task by right-clicking on an object and choosing the appropriate option. For instance, right-clicking on folders will display options that let you change the Cluster Administrator view, create a new group or resource, or view the properties of the object you clicked on.
When you right-click on groups and select Properties, three tabs-General, Fail over and Fail back-will appear. The General tab lets you decide which preferred node the grouped resources belong to and displays the state of the group. The Fail-over tab has two options that determine the group's fail-over threshold and the fail-over period in hours. The Fail-back tab has three options: Prevent fail back, allow fail back and allow fail back after a specified time. This last option lets you bring resources back to a node during a window of opportunity that you choose.
Taking Wolfpack for a walk
Wolfpack runs as a single service under Windows NT 4.0 Service Pack 2. In our tests, it ran fairly efficiently, occasionally causing CPU usage to climb above 60 percent without any user activity. Our tests, however, also revealed that this level of utilization didn't negatively impact performance.
Specifically, we ran several throughput tests to the server with and without the cluster service started. Starting the cluster service spawned three processes: CLUSSVC.EXE, RESRCMON.EXE and TIMESERV.EXE. The combination of these processes used 5.7MB of memory. Considering Wolfpack's capabilities, that memory footprint is more than reasonable.
We examined Wolfpack's fail-over capabilities using IIS, a simple DOS-based batch file, Notepad and a test network built in WINDOWS Magazine's lab. First, we tested IIS by configuring a browser running on a networked Dell OptiPlex GXi P200 desktop PC. Next, we instructed the browser to fetch sample Web pages from Node 1 in the server cluster. Finally, we turned off Node 1 and tried to refresh the Web page viewed on our client. Within 5 seconds, our client browser successfully refreshed the page. All the resources IIS required had migrated from Node 1 to Node 2, and the change was transparent to our client browser.
In another test, we simulated how a client might be affected by accessing a disk drive during fail over. To do so, we created a directory on a protected share on Node 1. The directory included a DOS-based loop batch file that repeatedly copied data from a client workstation to the protected share.
When we ran the DOS batch file from the client and turned off Node 1, the resources failed over to Node 2, and-as we expected-the familiar "Network error writing drive E: abort or retry?" error message was displayed. A retry failed to make the batch file continue, but sure enough, the share and data had moved from Node 1 to Node 2, which we determined by browsing Network Neighborhood.
Software vendors will need to tweak their applications to support clustering capabilities, but Wolfpack does offer some generic application-clustering features. To test these features, we took Notepad and its associated files and copied them to a protected share on the quorum disks (for more on quorums, see sidebar "A Crash Course in Clustering"). Using the Cluster Administrator, we created a new Generic Application resource for Notepad. Next, we placed the resource within a previously created group containing the proper dependencies for this test. Then we mapped a drive to Node 1 and launched Notepad from the shared directory. When we failed Node 1, the resources flew over to Node 2, and Notepad continued to save and open documents flawlessly.
The bottom line
Given its pre-beta state, Wolfpack still has a number of issues to resolve-particularly documentation and hardware support. Microsoft has only tested Wolfpack with Adaptec and BusLogic SCSI controllers, and disk drive support is limited to a few models.
Still, Wolfpack is extremely impressive. By the time you read this, Microsoft is expected to disclose shipping information for Wolfpack. That's great news for NT customers seeking bulletproof operations. With Wolfpack trotting by its side, NT for the first time should represent a true alternative to the midlevel UNIX environment. And once Microsoft delivers phase two of Wolfpack in 1998 (which will offer application load balancing and let you group several NT servers into a single cluster), NT may finally nip at the heels of the high-end UNIX market.