At about 10 am in the morning one day the entire network in our school went down. No internet access, no email, just passing traffic through our switches. Not knowing where to begin I figured I would start in the outside world and contact our ISP. I unconsciously pull up my browser to Google the phone number for our ISP only to be reminded that the internet is still down. I knew I was going to be in for a long haul.
Luckily I had a post-it note next to my desk with contact information for our eRate account rep from the ISP and was able to call in to see if we had traffic. I was given the information I needed to configure my laptop to get online directly from our modem and set off to the server room to test the internet. After confirming that there is traffic to our school all of a sudden the UPS in the room starts going off...
I'm already panicking that the entire school has been without network connectivity for 30 minutes. The UPS now distracting me I walk over and press a button on it thinking it would just disable the alarm, instead it cycled the power and took everything plugged into it down. PBX, PA, and bells. When it powered back up the PBX made a scary sound like plugging a toaster that is switched on into the wall. And just like that we have no phones.
Now the experience has gone from a series of unfortunate events to a full blown disaster. Still no clue what was preventing us from getting online, I leave the server room to tell the office that the phones are down, confirm that getting phones back up is now top priority, and get permission to get help from the outside on our PBX. I was promptly approved.
It did not take long for the technician to determine that the power surge had fried one of the power supplies. I now know that when power cycling the PBX I need to turn each switch off before powering it back on and then power each of the power supplies up one at a time. With the phones back up I'm free to start troubleshooting why we aren't getting online.
The next link in the chain is our firewall. I call in for support. They promptly asked me why our network interface has a /24 at the end of the i.p. Not being a blinky lights and wires guy I have no idea why this is a problem. The support tech went on to explain that a /24 suggests that we are claiming ownership of millions of i.p.s. I'm then directed back to our ISP to get the correct IP settings and then back to the queue with the firewall to go over the changes.
By now you can imagine that the experience will not come to a resolution at this point. Despite fixing several glaring mistakes in the configuration of our firewall we still have no internet. I phoned in a trusted network admin named Brian Norman at Lakes Country Service Cooperative to walk me through the next steps when the KISS mantra starts resonating in my head. Is our domain controller even online? The command prompt confirms that it is not.
Unfortunately turning on the domain controller is not as simple as just flipping a switch on a physical server. No we have vmWare and I need to boot vSphere to check the logs and restart the DC VM. Of course I don't have it installed locally on my machine because well that would just be smart.
With Brians help I plugged a monitor into the esxi host to read the instructions for how to get vSphere. Navigate to the ip address of the host and download vSphere client from there. Well it turns out that the link on this page points to an .exe on a vmware server in the outside world. Can this get any more complicated?
The end is almost here and if you are still reading at this point bear with me because this post has just turned into a schematic for how to troubleshoot problems like this in the future. I reconfigured my laptop to connect directly to our broadband modem, found my way to the vSphere link online, then download and install, reconfigure laptop to get ip settings from DC, log in to vmWare root and spin back up DC...deep breath...INTERNETS!!!!!!!
I left that day at about 9pm. My troubleshooting energies depleted completely. The next day the phones had another hiccup as one of the power supplies still had an issue. We also discovered that one of our bell systems was malfunctioning after the power surge. With no documentation or playbook it took me 3-4 painstakingly frustrating phone calls into the bell system tech support to get things reconfigured with a telephone programming system (don't worry bored reader, I won't go into any of the details). I'm not even going to get into the story of the problems with our access management server or our problems with POE switches running the door locks.
I'm still not sure that I have seen the end of this experience. What I do know is that I have learned a lot in the process. I now have phone numbers handy for all of the major vendors. I have ip settings for testing our internet connection handy.
It has been challenging to focus on the instructional side of technology in the last few weeks with the all this bouncing around in my head. Hopefully writing this down will clear out some mental space and energy to move forward.