Windows IT Pro is the authoritative and independent resource for windows nt, windows 2000, windows 2003, windows xp. Features a collection of resources and magazines for windows IT professionals.
  
  
  Advanced Search 


September 1999

Recovering from NT Startup Failures, Part 1


RSS
Subscribe to Windows IT Pro | See More Backup and Recovery Articles Here | Reprints | Or get the Monthly Online Pass—only $5.95 a month!
SideBar    Think Parallel

Tricks to prepare for and recover from NT meltdowns

That would you do if one of your core production servers crashed the next time you reboot it? More important, how much time would you need to fix the problem? For most Windows NT administrators, the thought of a mission-critical production server experiencing STOP errors (aka the blue screen of death) or any form of server outage makes them break out in a cold sweat.

A hosed NT system is never fun, but an unavailable critical server means lost productivity, lost time, lost money, and, of course, an angry boss. In this first installment of a two-part article, I discuss advanced tools and procedures that you can use to improve the availability of your network servers and to increase your chances of recovering from an NT boot failure. In addition, I delve into lesser-known techniques that you can employ right away to help you recover a downed NT system in the future. In this article, I don't address clustering solutions, and I assume that each system is a standalone, nonclustered NT system without system-level failover.

Common Calamities
Although various circumstances can cause an NT system to crash at startup, the result of these circumstances is usually the dreaded blue screen of death, which Screen 1, page 100, exemplifies. After NT halts the system, it displays this screen to protect the system against data corruption. In addition to being blue as its name implies, a blue screen displays important information about the system's state at the time of the STOP error. The screen lists the STOP code, the location in memory where the problem occurred, and the drivers loaded in memory when the STOP took place. However, pinning down the source of a STOP error isn't always easy. In my experience, a problem usually develops from one of the following scenarios:

  • You install software that corrupts the HKEY_LOCAL_MACHINE portion of the Registry—particularly, software that installs new services or drivers. This action usually results in a STOP error or blue screen, which indicates that the system Registry or a particular hive file failed.
  • You change a system's network configuration, which causes NT to rewrite network bindings and their related Registry entries (i.e., NT corrupts or overwrites critical OS files with invalid or incompatible versions while the system is in use).
  • You install a new service or driver on the system, which causes a system-level incompatibility problem that results in a STOP error when you reboot (i.e., underlying file corruption has occurred on a key system file that you loaded into memory before the corruption).

Each of these situations has a different set of underlying causes and solutions, so let's look at each scenario individually.

Registry Corruption
The system Registry is the heart of an NT installation. Thus, depending on the nature and extent of the damage, a corrupted Registry often results in a STOP error or blue screen of death at startup. Damage to the Registry can be physical or logical. Physical damage means that something (usually disk-related corruption) has scrambled the Registry hive files (e.g., the SOFTWARE or SYSTEM files in the \%winntroot%\system32\config folder). Logical damage means that a third-party application, a user, or NT has written invalid data to the Registry, which can trigger an NT startup failure if the logically damaged Registry entry is critical.

Unfortunately, you can't always tell whether a damaged Registry is the cause of your system's STOP error. The STOP error might identify a telltale sign such as a hard Registry error or a reference to a particular damaged hive file. However, in some cases, the STOP error doesn't indicate Registry damage.

If you suspect a Registry-related problem, the first line of defense is to restore a previous known-good Registry configuration. You can use several methods to accomplish this solution.

The Last Known Good Configuration option. You access this option by pressing the space bar when the system prompts you during the NT boot process, and selecting the option to restore a previous configuration. This method is the quickest and easiest solution, if it works. Unfortunately, this solution's failures outweigh its successes in real-world applications because its scope is only a previously known-good incarnation of one portion of the Registry (i.e., a ControlSet00X Registry subtree of the HKEY_LOCAL_MACHINE\SYSTEM key). You have a better chance of success using the Last Known Good Configuration option if the problem is localized to this portion of the Registry and an event that immediately precedes the invocation of the Last Known Good Configuration option caused the problem. However, this procedure won't cure most of your Registry-corruption ills.

NT Setup's Repair process and an Emergency Repair Disk (ERD). You can use NT Setup's Repair process to inspect and replace individual Registry hive files if the Last Known Good Configuration option fails to resolve the problem. After you insert your ERD, Setup lists the options you can select to specify which portions of the NT installation you want Setup to inspect, as Screen 2 shows. If you select Inspect registry files, Setup displays a list of Registry hive files and lets you select which files you want Setup to replace. Setup takes the replacement files from the ERD or, if you didn't provide an ERD, from the \%systemroot%\repair folder. The ERD and the \%systemroot%\repair folder store replacement files in compressed format, and each hive file has an underscore (_) extension (e.g., SYSTEM._, SOFTWARE._).

Using the most recent replacement files is important so that you don't lose application and service configuration information. (For information about how to update your ERD, see Michael Reilly's "The Emergency Repair Disk," January 1997.) In addition, don't restore the SAM and SECURITY hives on an NT server domain controller, unless you used the rdisk /s (or /s-) option when you ran the ERD utility (i.e., rdisk.exe). Otherwise, Setup overwrites your SAM database with the database version Setup created during the original NT installation and creates a new set of problems. In addition, ensure that you created the replacement files under the same service pack level as the files you're replacing because Service Pack 3 (SP3) and later make security-related changes to the SAM and SECURITY hives. Otherwise, you might not be able to log on after the repair is complete. Restoring the SAM and SECURITY files usually won't resolve your Registry corruption problems anyway because the SYSTEM and SOFTWARE hives usually cause Registry boot problems. Thus, start restoring previous Registry files with the SYSTEM and SOFTWARE files, and replace the SYSTEM hive first because it contains references to important system components, including drivers and services.

   Previous  [1]  2  Next 


Reader Comments
I don't know how you do it- but just when I start to worry about something, I pick up an NTMagazine and there you are with the answer. Its amazing how you get right into the worry part of my brain and know just when and what I am worrying about.

Funny thing is this time - I was catching up on older issues I hadn't had a chance to read and bingo! The first one I picked up there was this article answering the very question I had been asking but hadn't had time to act on formulating a written plan - What should be my plan of action in case of a server failure.

Thanks NT Mag. You saved me hours - you generally do.

Suzanne Foubert
Systems Administator
Baylor College of Medicine
Houston, Texas

Suzanne Foubert October 29, 1999


Excellent article. This answered a few questions that had been bothering me. Keep up the good work.

Ruben Rodriguez November 01, 1999


I just want to let you know how much
I appreciate Sean Daily's "Recovering from NT Startup Failures, Part 1" (September 1999). As a beginner who is working toward completing my six exams for an MCSE, I found the article very practical. I'm looking forward to part 2.

Jackie Molen December 13, 1999


<i>Part 2 appears in the November 1999 issue (page 83), and I hope you find the article equally helpful. Part 2 delves into several disaster preparation and recovery topics that I didn't have space for in part 1.<br><br>

--­Sean Daily</i>

Sean Daily December 13, 1999


After reading "Recovering from NT Startup Failures, Part 1," I thought you might help me with a specific problem. In a stressful moment, one of our customers changed the permissions on the system share C$, setting System to No Access. Now she has an infinite boot on the server--­it boots and boots and boots! My customer tried an Emergency Repair Disk (ERD) without any luck. The entire Microsoft BackOffice product line is installed on the server, and accessing SQL Server 7.0 and the databases is a major concern. Is there any way to help my customer? In the article, the author refers to ERD Commander and NTFSDOS--­is it too late to use these tools?

Bo Heegaard December 13, 1999


<i>You can recover from this situation in several ways, but the easiest way is to have your customer install a parallel installation of Windows NT. While your customer is booted under that installation, have her set whatever permissions are necessary to get the original installation back up and running. After she can boot back into the original installation, she can use Fixacls from the Microsoft Windows NT Server 4.0 Resource Kit to restore the original permissions on the \%systemroot% folder and its subdirectories.<br><br>

--­Sean Daily</i>

Sean Daily December 13, 1999


Sean Daily's "Recovering from NT Startup Failures, Part 1" (September 1999) is very informative, but I'd like to know more about troubleshooting Windows NT's memory dump file. If I forward dump-file output to Microsoft, I usually get an answer, but it doesn't help me update my knowledge. Can you provide some guidelines for tracing problems in a memory dump file?

Prasanna Ghanekar December 13, 1999


<i>Unfortunately, I can't claim to be an expert at interpreting memory dump files. In the 7 years that I've been working with NT, I've never encountered a situation in which examining a memory dump file proved useful. However, I've successfully recovered from dozens of STOP errors on various NT systems.
The average network administrator will find blue screen information more helpful than a memory dump. The blue screen information contains valuable information about the drivers and services present in memory and the specific STOP error that halted the system. Your best bet is to concentrate on what changed just before the blue screen and which error message you received. Researching (e.g., in the Microsoft Knowledge Base, Deja.com, newsgroups) other occurrences of the particular STOP error is often the most efficient way to resolve problems.
Microsoft provides the memory dump file primarily for software developers, rather than users or network administrators. The dump file provides to developers a real-world stack dump from a customer site that might yield clues about whether a service or driver participated in a particular problem. However, sending these files to the software developer is often impractical because the files are so large. I've found that most developers aren't interested in receiving or analyzing these files, and even Microsoft has recently changed its policies about sending in dump files. Luckily, Windows 2000 (Win2K) makes the memory dump file more useful: You can pare down the file to a minimal set of information that can be more helpful to Microsoft and third-party developers. Perhaps with this change, memory dump files will become more useful. <br>><br>

--­Sean Daily </i>

Sean Daily December 13, 1999


Thank you very much for the advice in Sean Daily's "Recovering from NT Startup Failures, Part 1" (September 1999). The author mentions that creating a parallel Windows NT installation provides you with a back door to your system when your primary installation is down. I have several questions about parallel installations. Why can't I just use the boot disk that I created instead of a parallel NT installation? From the NT Setup's Repair options, I have the opportunity to inspect the Registry files. If I choose the \%systemroot%\repair folder instead of the Emergency Repair Disk (ERD), which one has the most recent files?
--­Thomas Leung

Thomas Leung February 16, 2000


<i> Although you can use an NT startup disk in some cases (e.g., a file required to start NT, such as NTLDR, is corrupted or missing; the boot sector is damaged), this strategy wouldn't help in many other situations. For example, if files in the \winnt folder are damaged or the problem involves the Registry, using an NT boot disk won't help. In these cases, the parallel installation will let you access the NTFS boot volume and make the necessary repairs to files or the Registry.
In regard to your second question, the files in the \%systemroot%\repair folder and those on the ERD will usually be the same. Of course, that's assuming that you opted to update the ERD the last time you ran Rdisk (choosing to update is an option, not a requirement). NT first makes a compressed copy of the Registry into the repair folder on the hard disk, then copies those files to a 3.5" disk during the ERD creation process. However, if you have an out-of-date ERD or you didn't update the ERD during the last execution of Rdisk, the hard disk-based copy would be the most recent version.
--­Sean Daily </i>

Sean Daily February 16, 2000


 See More Comments  1   2 

You must log on before posting a comment.

If you don't have a username & password, please register now.




Top Viewed ArticlesView all articles
Command Prompt Tricks

One reader shares his tip for setting up the command prompt to reflect a remote path. ...

PsExec

This freeware utility lets you execute processes on a remote system and redirect output to the local system. ...

How can I stop and start services from the command line?

...


Related Articles Recovering from NT Startup Failures, Part 2

Storage Whitepapers Combining Deduplication and VMware Disaster Recovery: Cascading Savings Improves Cost Effectiveness

Virtualizing Microsoft Exchange Server 2007

StoreVault SnapManagers for Microsoft Exchange and SQL Server

Related Events Storage Consolidation for Your Microsoft Applications: Reducing Cost and Complexity

Virtualization, Automation and Databases

Check out our list of Free Email Newsletters!

Storage eBooks A Guide to Windows Certification and Public Keys

SQL Server Administration for Oracle DBAs

Keeping Your Business Safe from Attack: Encryption and Certificate Services

Related Storage Resources Become a VIP member of the Windows IT Pro community!
Get it all with the VIP CD and VIP access. A $500+ value for only $279!

Subscribe to Windows IT Pro!
Solve your toughest technical problems with our experts and access 10,000 + articles online. 30% off

Monthly Online Pass - Only $5.95!
Get instant access to 10,000+ articles from Windows IT Pro Magazine!

TechNet Virtual Labs
Evaluate and test Microsoft's newest products.


Windows IT Pro Home Register FAQ for Windows WinInfo News
Europe Edition About Us Contact Us/Customer Service Media Kit Affiliates / Licensing  
SQL Server Magazine Office & SharePoint Pro Windows Dev Pro IT Job Hound ITTV
IT Library Technology Resource Directory Connected Home Windows Excavator Windows SuperSite 
 
 Windows IT Pro is a Division of Penton Media Inc.
 Copyright © 2008 Penton Media, Inc., All rights reserved. Terms and Use | Privacy Statement | Reprints and Licensing