Wednesday, December 4, 2013

The Beginning Network Admin's Network Troubleshooting Checklist

I've recently been asked to work with groups of non-network admins to help them understand the basics of computer networks. Each of the groups has, as part of their jobs, installation of proprietary systems that must integrate with existing computer networks. None of the people in these classes is ever going to use tools like Fluke network testers or similar tools. They simply need to be able to install their software or hardware and do basic network troubleshooting. If that sounds like your situation, or if you're a new network administrator, here is a simple, ten-step checklist for network troubleshooting. It should be helpful for both beginning network administrators and non-network admins, too.
  1. Do a quick check of the rest of the network. Before you start running all kinds of tests on your system, check to make sure the rest of the network is functioning properly. Are other people or systems on the network having the same or similar problems as you? The problem may have nothing to do with your system or configuration.
  2. Gather information. What has changed since things were working properly? Has any new software been installed? Has any new hardware been installed? What settings may have been changed, either by you, another administrator, a vendor, or an end user? Have there been any power outages? Have there been any maintenance crews doing work which involved moving network equipment? Ask end users what was being done at the time of the failure. (Be careful not to be accusatory when asking questions like this. You want to get honest, helpful answers. You don't want people to get defensive or try to hide things.)
  3. Start at the physical layer. Check link lights, power cables, circuit breakers, cables, heat and dust (too much of either will down any electronic device including routers, switches, access points, firewalls, and servers). You can purchase an inexpensive cable tester for around $40.00. Such testers can be helpful, but if you don't have one, test a cable by replacing a known working cable with the cable in question. If it works with the first cable, but not the second, you've identified the problem. Make sure network cards are enabled. Check in the network settings on each affected device to ensure that network cards are enabled and active.
  4. Use ping, tracert (traceroute in Unix/Linux/Cisco), and pathping to test connectivity. Use ping within a single IP subnet and tracert or pathping when multiple IP subnets are involved. Here's how to use ping: Start by pinging your localhost (ping localhost or ping 127.0.0.1), then ping the IP address of the system you're working on, then ping another host on the same subnet, then ping the default gateway, and finally ping a remote system on another subnet. If all of the pings work except the remote system, try using tracert or pathping to see where connectivity fails. Try a ping by hostname instead of IP address. If a ping by hostname doesn't work, but a ping by IP address does work, the problem is most likely related to name resolution. See my post on DNS client troubleshooting.
  5. Check the routing table. If pings to systems on different networks fail, check the routing table for explicit entries, the correct default gateway, or duplicate default gateways (there should be only one). On a Windows computer, use the command route print to see the routing table. On a Cisco router, use the command show ip route.
  6. Check IP address settings. Use the Windows command line utility ipconfig to make sure the IP address is what you expect.
  7. Check for problems with MTU size. Use the mturoute tool to check MTU sizes on the network.
  8. Check for DHCP server connectivity. If you see an address starting with 169.254, that's an indication that the device or system could not reach a DHCP server to get an IP address.
  9. Is the IP address on the correct network? Check to ensure that the IP address of the device is on the same network or subnet as the rest of the devices. (If the subnet mask is 255.255.255.0 or /24, check to ensure that the first three octets of each of the connected devices match.)
  10. Check for a firewall blocking traffic. If you can ping out from a device, but not to the device, a likely culprit is a firewall on the device. Check the security settings. If the Windows firewall is disabled, check to see if a third-party firewall is enabled such as ZoneAlarm or Norton Internet Security. I once was stymied in troubleshooting by a surprise firewall that was included with a VPN client, so check for any other security-oriented software that might include a firewall.
Most network administrators have plenty of stories about forehead-slapping moments in troubleshooting when they missed something obvious. By following the above steps, hopefully you'll avoid flattening your forehead.

For More Resources for I.T. Pros

You'll find books on Cisco and Linux technologies at my bookstore at soundtraining.net/bookstore. Also, check out my video channel at soundtraining.net/videos.

Please Leave a Comment

If you find this networking tutorial helpful or if you notice something that needs to be corrected, please leave a comment.

2 comments:

Unknown said...

Great to have this list. I'd quibble about the order of things based on my experience about typical symptoms that I've seen in my time, but of course, everybody's experience varies.

However, I do think this needs an expanded discussion of name resolution; also while complicating things it terms of diagnosis, often on Windows machines, you have to go into "Internet Settings" and muck there with LAN settings etc.

Unknown said...

Paul, great comment. I'll add that this weekend. Thanks!