Troubleshooting can account for as much as 90 percent of a network administrator's job. No one likes putting out fires, but you don't always have a choice. Good troubleshooting skills enable you to respond quickly in a crisis situation and keep your network running smoothly. When you face a troubleshooting challenge, start by asking yourself basic questions. What has changed? Has this problem occurred before, and if so, when? Is the problem reproducible? Did the user do anything differently? Are other users experiencing the same problem?
Next, try to isolate the problem, "cutting it in half" with each step you take to get closer to its source. For example, if a workstation can't connect to the network, try to determine whether you're facing a networkwide problem or a workstation-specific problem. If you can quickly determine that the problem applies to the workstation only, you've removed a significant half from the equation and are closer to isolating the problem. Even if you can't find a solution, isolating the problem will save a tremendous amount of time when you seek outside help.
To give you an idea of how this process works, I've gathered several troubleshooting scenarios, ranging from common but simple problems to more difficult challenges. You might run into similar situations in which you can apply some of the basic questions that I use to isolate the problems in these examples. For more information about the tools I use in the following scenarios, see the sidebar "Basic Troubleshooting Tools," page 56.
Problem: No Domain Server Is Available to Validate Passwords
You've undoubtedly encountered this problem: You sit down at your workstation and try to log on to the network, but you receive the dreaded No domain server was available to validate your password error message.
To troubleshoot this problem, you must determine whether the problem relates to the workstation, the network, or the server. Start by asking the following questions:
- What has changed? Have you made any changes to your network that might have resulted in a problem? Did you add a new server, remove an existing server, make switch or hub changes, add or remove a domain controller (DC), or promote or demote a DC?
- Are other workstations experiencing the problem?
- Is the server up?
You discover that the workstation has been working as it should until now. No other workstations are experiencing the problem and the server is up, so you can safely presume that this problem is workstation-specific. Next, you need to determine where within the machine the problem lies. Your next questions are as follows:
- Can the workstation ping the server?
- Can the workstation obtain an IP address?
You can ping the server, but the ping times out on occasion, which indicates that you're experiencing intermittent communication between the server and workstation. From a command line, you type
ipconfig /renew
When you run this command multiple times, the workstation sometimes renews its IP address lease and sometimes does not. This symptom is an indication of intermittent communication between the server and workstation. You decide to swap out the workstation with another working workstation. The new workstation doesn't work in the original workstation's location, and the original workstation can connect to the network without problems from another location. Clearly, something is wrong with the original location's cable run or hub.
You try connecting the cable run to a different hub but still can't connect to the network. You now know that the cable run is the culprit. You've isolated the problem. Further investigation reveals that a cable tie in the server room has been cut and the drop run has a severed pin six.
The client I wrote about in this article had a very large network and many OUs. They wanted to place this particular server in a different OU to make it "easier" to find. This approach was just a matter of style. The service that failed to start was VERITAS Software's VERITAS Backup Exec. Backup Exec must start with a valid user account, so that's the reason they couldn't use the Local System account. With the Exchange add-in for Backup Exec, you must have a unique mailbox account start the Backup Exec services in order to perform a mailbox backup. And yes, you should edit the GPO for the particular OU to which the server was moved, not the Domain Controllers OU. In this specific scenario, services that start under the Local System account are not affected by a change in the OU. For more information about service startup, see the Microsoft article "How to Troubleshoot Service Startup Permissions" (http://support.microsoft.com/?
kbid=259733). <BR>
--Alan Sugano
Jacques Willemen January 12, 2004