- Do a quick check of the rest of the network. Before you start running all kinds of tests on your system, check to make sure the rest of the network is functioning properly. Are other people or systems on the network having the same or similar problems as you? The problem may have nothing to do with your system or configuration.
- Gather information. What has changed since things were working properly? Has any new software been installed? Has any new hardware been installed? What settings may have been changed, either by you, another administrator, a vendor, or an end user? Have there been any power outages? Have there been any maintenance crews doing work which involved moving network equipment? Ask end users what was being done at the time of the failure. (Be careful not to be accusatory when asking questions like this. You want to get honest, helpful answers. You don't want people to get defensive or try to hide things.)
- Start at the physical layer. Check link lights, power cables, circuit breakers, cables, heat and dust (too much of either will down any electronic device including routers, switches, access points, firewalls, and servers). You can purchase an inexpensive cable tester for around $40.00. Such testers can be helpful, but if you don't have one, test a cable by replacing a known working cable with the cable in question. If it works with the first cable, but not the second, you've identified the problem. Make sure network cards are enabled. Check in the network settings on each affected device to ensure that network cards are enabled and active.
- Use ping, tracert (traceroute in Unix/Linux/Cisco), and pathping to test connectivity. Use ping within a single IP subnet and tracert or pathping when multiple IP subnets are involved. Here's how to use ping: Start by pinging your localhost (ping localhost or ping 127.0.0.1), then ping the IP address of the system you're working on, then ping another host on the same subnet, then ping the default gateway, and finally ping a remote system on another subnet. If all of the pings work except the remote system, try using tracert or pathping to see where connectivity fails.
- Check the routing table. If pings to systems on different networks fail, check the routing table for explicit entries, the correct default gateway, or duplicate default gateways (there should be only one). On a Windows computer, use the command route print to see the routing table. On a Cisco router, use the command show ip route.
- Check IP address settings. Use the Windows command line utility ipconfig to make sure the IP address is what you expect.
- Check for problems with MTU size. Use the mturoute tool to check MTU sizes on the network.
- Check for DHCP server connectivity. If you see an address starting with 169.254, that's an indication that the device or system could not reach a DHCP server to get an IP address.
- Is the IP address on the correct network? Check to ensure that the IP address of the device is on the same network or subnet as the rest of the devices. (If the subnet mask is 255.255.255.0 or /24, check to ensure that the first three octets of each of the connected devices match.)
- Check for a firewall blocking traffic. If you can ping out from a device, but not to the device, a likely culprit is a firewall on the device. Check the security settings. If the Windows firewall is disabled, check to see if a third-party firewall is enabled such as ZoneAlarm or Norton Internet Security. I once was stymied in troubleshooting by a surprise firewall that was included with a VPN client, so check for any other security-oriented software that might include a firewall.
If I've missed anything, please leave a comment.
Most network administrators have plenty of stories about forehead-slapping moments in troubleshooting when they missed something obvious. By following the above steps, hopefully you'll avoid flattening your forehead.