Tech Woes; Server With No Display!
2017-11-11 23:10 - General
I've got a story to tell, about my broken computer. It's actually still somewhat broken, but I've just climbed up out of a valley that so deep, it feels almost like it's fixed. Here's the story, but be warned: it's probably too much detail.
So, I keep a server at home. It's multi purpose. It stores my important files on a redundant ZFS array. It plays some of those (media) files on my TV. It runs and exposes various small services. It's deeply important to me given all these things I use it for. In addition to that local disk array, I have a remote backup of the data on a similar machine at my Mom's house. Which has an extra disk for her data. And I've got an extra disk for a remote backup of that.
That last fourth disk in my server is a relatively recent addition. Around when I put it in, I noticed that the drives all stuffed next to each other get a bit warm. I decided to install an extra fan, to keep them cooler. This started its own sad story. There's an unused "case fan" header on my motherboard, yay. I have a spare compatible fan, yay. It doesn't support speed control though, and is far too loud to keep running in a studio apartment. So I found a bigger fan, which supports speed control. Got it all set up, figured out how to set the speed, and I can tell even when running slow enough to be effectively silent, it still moves plenty of air. Great! So I screw it in to place, get ready to call it a day ... and discover the cable isn't long enough. Long story short, I probably crossed a wire and zapped something while trying to extend it to reach. It doesn't go anymore. Not sure if it's busted, or the power connector. But I find yet another fan, this one uses not the fan connector but just the standard power accessory (i.e. old IDE disk) connector, and is designed to be quiet. Great!
But doing all this effort to get a working fan installed involved opening up the computer, moving things around, fiddling with them... And as I said quiet is nice when you only have one room. I noticed a little noise, it seemed to be a fan (new fan, CPU fan, power supply fan?). I decided to use my standard technique of (briefly) stopping the running fan by jamming something into it to narrow down where the noise was. Wasn't the new fan. Wasn't the CPU -- but this one ran fast enough to hurt my finger a bit so I stopped using that. Popped a screw driver into the power supply fan and WHAM. Broke one of the blades off of it. Ugh. Had to open it up a bit to move the fan guard out of the way to un-jam it. Everything still worked, but I had never completed my earlier goal of figuring out where the minor noise was coming from. Everything was still off, so I put the driver back into place, intending to block that fan from spinning while powering it back on.
ZAP! I shouldn't have used a metal screwdriver. There was a spark and a pop. And I busted a fuse. And I was scheduled to leave on a flight at ten AM the following day (this was the night of October 4th). Stomach in knots. I managed to take the power supply out, take it apart, and find the fuse, confirming it was blown. And soldered in. I have an old power supply lying around that was supposed to be for a project that never came to be. Open that one up. Its fuse is soldered down too, but compatible. Remove it. Remove the blown one, replace it. Put it all back together. Plug it all in. It turns on! Everything shakes a tiny bit, as the fan with a missing blade spins, but it turns on.
But no matter what, it doesn't show anything on the TV anymore like it used to. Unplugged a monitor from my desktop to carry it over, and it won't show anything from any of the other connectors, either. Stomach drops again. Before long, I figure out that everything but the display works. If I power it on, wait patiently until I know it's asking for a password, type it in blind and wait again: it boots. It responds on the network, and so everything I use it for still works -- except playing things on the TV.
So I leave it be, fly out for my trip and eventually come back. I'm pretty patient here, but I know I've got to do something. The first thing I do is replace the power supply. Really I just need a fan, but just the right one, which won't be easy. They're not too expensive so after $25 and a few days I have the replacement in. It doesn't have an off-balance fan, but otherwise it doesn't help, still no video. What to do? By lucky coincidence (this might have been earlier...) I have a spare identical video card, so I swap it in. Still no display. So I order a replacement motherboard, wait for it, laboriously swap all the components over. Still no display. So I return the motherboard I don't seem to need. (And take a $20 hit in return shipping/restocking fees. Blech.)
So what can be left? I know sometimes the motherboard's built in video is really controlled by the CPU, so I order a replacement CPU. While waiting for that to arrive, I make a stupid mistake. Right now, the computer is functional, but I can't see its display. If something goes wrong, I can't fix it, because I can only use it in a working state, when it boots and I can then remotely log in. Somehow I forgot all this and in an otherwise idle moment, I started an update for all three of my similar machines (this one, the backup at Mom's, and the public hosted server I run). Of course, of course, this time something goes wildly wrong on this machine. Some shared library that everything links to is hosed, and I can't run any more commands at all.
That was three days ago, the 8th. Yesterday I got the replacement CPU, but it didn't help because I can't boot that machine right now. Today I traveled into Brooklyn for the Killer Queen Coronation competition, and I had a few spare hours afterwards, and I was already close to Micro Center. I was at the point that I was ready to throw money at the problem. Just buy enough replacement parts that I'll surely end up working, and put this stressful mess behind me! Well, once I get there I realize I'll be over $300 in the hole for a new motherboard, CPU, and memory.
But I've already replaced the motherboard, that didn't help. I've replaced the video card which didn't help. What is going to help? I'm getting less confident that this expensive solution is going to be it. And it gets worse. Remember that ZFS array I mentioned at the top? It's encrypted, so I need a separate boot partition. On a separate drive. Right now, that's a Compact Flash card in an IDE adapter. Which sounds crazy, but works great. Except new motherboards don't come with IDE connectors anymore, so I'll need to buy even more something to make that work, and I'm starting to really doubt myself. And with no display how will I set up that replacement? What now?
Here starts the silver lining. I figured out a temporary fix that I was confident enough in that I gave up, left the store empty handed, and headed home to try it. I took my main desktop computer mostly apart, plugged in the server's drives instead, and booted a USB rescue environment there. And it worked, I could mount the disks. I'm happy I put my root partition on the ZFS volume, because now that I could finally boot "the server" and see its display, it was trivial to do a snapshot rollback. Took everything apart again, reassembled the original server, booted it blind, and voila! It's running, sans display, again. I can remember not to update it and patiently figure out a reasonable long term fix. Phew! I'm actually only right back where I was a month ago, but with the whole thing broken for a few days, this feels quite relaxing in comparison.