Many people view the ability to perform rolling upgrades as one of the main advantages of deploying Exchange Server clusters. However, if you're used to working with Exchange 2000 Server, you should be aware that changes in Exchange Server 2003 add some complexity to the procedures for upgrading Exchange 2000 clusters to Exchange 2003, upgrading Exchange 2003 clusters to Exchange 2003 Service Pack 1 (SP1) or later, or applying hotfixes that change the build number of the exsetdata.dll file. Understanding these changes, what sort of planning you should perform before the upgrade, the upgrade process itself, and a few troubleshooting ideas will send you well on your way to success. (This article requires a basic understanding of Microsoft clustering technology and how a clustered Exchange deployment differs from a standalone deployment. The sidebar "Rolling Upgrades" addresses a few of these points.)
Changes to Note
Microsoft made some changes to the rolling upgrade model with the release of Exchange 2003 and Exchange 2003 SP1. When you upgrade Exchange 2000, you only need to upgrade the binaries on each node. Upgrading Exchange 2003 to SP1 or later requires an additional task. After you've applied the upgrade to one node, you must use Cluster Administrator to take your Exchange Virtual Server (EVS) offline, move the EVS to the newly upgraded node, then right-click the System Attendant resource and select Upgrade Exchange Virtual Server. This process updates the metadata associated with the EVS in Active Directory (AD) to reflect the new Exchange version.
When you're running an Exchange cluster with more than one EVS, upgrade only one EVS at a time. This guideline applies to both active/active and active/passive clusters. The upgrade procedure uses a global variable (g_csCachedXSes) that can be modified by only one upgrade session at a time. The Microsoft article "Setup Stops Responding When You Upgrade Multiple Exchange Server Virtual Servers at the Same Time" (http://support.microsoft.com/?kbid=822582) describes what can happen if you try to run EVS upgrades in parallel.
Planning
Before you upgrade to Exchange 2003 SP1, certain elements need to be in place. (Most of the following requirements also apply to nonclustered Exchange servers.)
Schedule downtime with the user population. I've performed several SP1 upgrades and have been lucky in that all the users were running Microsoft Office Outlook 2003 in Cached Exchange Mode. (Cached Exchange Mode reduces the visibility of downtime because users can continue to work offline against the cached copy of their mailboxes while the EVS is offline.) The upgrade procedure includes two phases during which you must take the EVS offline. Outlook users experience downtime while the EVS is offline on one node until it's brought online on another node. This process is known as failover. For tips on reducing failover time and the visibility of failovers, see "8 Ways to Improve Your Exchange Cluster, Part 2," May 2004, InstantDoc ID 41943. From my testing, the combined failover downtime for each EVS is on average between 3 and 6 minutes. If your service level agreement (SLA) allows it, try to allocate 30 minutes of downtime for a service pack upgrade. Exchange won't be offline for the entire duration but might go offline for periods of 1 to 2 minutes as you perform failover testing. If you have the luxury of planning an hour of downtime, take it! If you run into a problem during the upgrade, you'll have additional downtime to resolve it. You can also limit disruptions to users by using the downtime to apply any Windows security patches that have been released since you last performed maintenance. And while you're at it, be sure to download and install the Windows 2003 hotfix 831464 (http://www.microsoft.com/downloads/details.aspx?familyid=0bc9b5bc-a094-49bf-89a5-c8a2d32345a2), which is required before you can install Exchange 2003 SP1. This hotfix resolves problems rendering content to Outlook Web Access (OWA) clients.
To upgrade an Exchange 2003 cluster, you must use an account that has Exchange Full Administrator permissions on the Administrative Group in which the EVS resides, and your account must be a member of the Local Administrators security group on each node. If the EVS is part of a routing group that's a member of multiple administrative groups, your account must have Exchange Full Administrator rights on all those administrative groups. For more information about these requirements, read the Microsoft white paper "Working with Active Directory Permissions in Exchange Server 2003" (http://www.microsoft.com/technet/prodtechnol/exchange/2003/library/ex2k3ad.mspx). You can simplify permissions management by creating a security group for your Exchange cluster administrators. For example, create an ExchangeClusterAdmins security group, then delegate Exchange Full Administrator rights to that group and add the group to the Local Administrators group on each node. If you need to add a node to the cluster, you need only add the ExchangeClusterAdmins security group to the Local Administrators group on that node, rather than having to add several individual accounts. If you want to revoke someone's permissions for managing the cluster, simply remove the user account from the ExchangeClusterAdmins group. Doing so saves you timeyou don't need to remove an individual account from the Local Administrators group on each node to revoke the person's Exchange permissions.
Make a full backup of the cluster before you begin the upgrade. This backup should include file-level backups of each node, system state backup on each node, and full Exchange database backups.
The Microsoft article "How to obtain the latest service packs for Exchange Server 2003" (http://support.microsoft.com/?kbid=836993) contains up-to-date information about obtaining the most recent Exchange 2003 service pack. As you get ready to begin the upgrade, keep in mind that if you've deployed an Exchange front-end/back-end server architecture in which your cluster is on a back-end server or servers, you need to upgrade the front-end servers first.
Applying the Service Pack
The Microsoft article "How to install Exchange Server 2003 Service Pack 1 in a clustered Exchange environment" (http://support.microsoft.com/?kbid=867624) runs through the updated procedures for applying Exchange 2003 SP1 to a cluster. Let's look at the necessary steps in more detail, using a two-node (Node1 and Node2), active/passive Exchange 2003 cluster with one EVS (EVS1). Node1 is the active node and has current ownership of EVS1.
If you haven't already applied hotfix 831464 to Node2, do so, then reboot. When Node2 rejoins the cluster, apply Exchange 2003 SP1 to the node.
To complete the upgrade, take EVS1 offline but leave the Network Name, IP Address, and storage resources that are associated with the EVS online. To do so, right-click the System Attendant resource and select Take Offline, as Figure 1 shows. This action takes offline Exchange resources that depend on the System Attendant (e.g., the Information StoreISresource, the IMAP and POP Protocol resources). Move the Exchange cluster group that's associated with EVS1 from Node1 to Node2.
Log on to Node2 and open Cluster Administrator. While the Exchange resources associated with the EVS are offline, right-click the System Attendant resource and select Upgrade Exchange Virtual Server. Note that you can't perform this process from a Cluster Administrator session running on Node1 because the files required for the upgrade procedure aren't yet installed on Node1. The requirement to run an additional Exchange service-pack upgrade procedure from Cluster Administrator is new in Exchange 2003 (it was first introduced as part of the cluster-upgrade procedure from Exchange 2000 to Exchange 2003). When the upgrade process is finished, you should see the message The Exchange Virtual Server has been upgraded successfully.
Bring the Exchange resources back online by right-clicking them and selecting Bring Online. Note that you must manually bring each Exchange resource back online: Bringing the System Attendant online still leaves the dependent resources offline. When all the resources are online, check the Application log for errors and take any necessary action.