Register or Login to Hide This Ad for Free!

Gavin

root@parkfans.net:/# rm -r *▌
Administrator
Sep 27, 2009
5,494
6,578
280
32
Skyland Estates, Virginia
gavinmcnabb.com
So...I screwed up.

As many of you are aware, The ParkFans Forum was down for over 36 hours. The reason? A single bad keystroke that set off a domino chain reaction; basically, I screwed up.
.
.
.
Like MAJORLY screwed up.
.
.
.
But it's all fixed now, I swear!

For a more technical answer, let's start with some background. Unbeknownst to many of you, the sites hosted by ParkFans run off of two servers. Server A handles serving dynamic web pages(like this forum thread); while Server B handles database queries and serving static content (like images). There are various reasons as to why it's set up this way that we aren't going to go into, but our focus is on Server B. I was performing some routine user maintenance on Server B, which involved editing /etc/passwd. While I was editing /etc/passwd I had somehow accidentally pasted in additional text on the line that defined the root user for the server. Unfortunately for me, I did not realize my error until I had logged out and rebooted the server.

I basically broke root...but it gets worse.

Of course, like any sane admin, we do have backups. We back up the database and server image every night automatically. The problem? Database backups were not complete, making them completely useless. Honestly, they would have been useless anyway, as they would not have had any posts for the last day. Luckily, I was able to get a full up-to-date dump of the database and the server backup was good. You would think that with all of that, it would be a piece of cake to get it all fixed.

Fat chance.

From MySQL throwing errors and just straight up refusing to import my database dump for being too big, to MyBB failing to sync up its cache, it just wouldn't work. Nothing I did could make the database import properly so that MyBB could read it. So I did what any sane person would do in this predicament.

I rewrote the database dump...all 320 MBs...by hand on a netbook.

To be fair, there was a lot of copy and paste involved. There was also a lot of trial and error. However, after going at it nonstop for the past two days, I was able to recover everything right up to the minute when it all went wrong. Or at least as far as I can tell.

Did I learn anything from this mess?

Of course I did. I learned to make sure that we have up-to-date and tested backups before I start any sort of work on the server. Another thing would be to test automatic backups on a regular basis. Proofreading any changes that I make before I commit them would also help, as well as trying out changes on a test server first before moving to production (I would need to check on the feasibility of that first).

All in all, I do deeply apologize for the inconvenience I may have caused any of you as well as the sheer incompetence on my part. I can't say for certain that it won't happen again, but I can do my best to mitigate any future issues.

--Gavin

P.S. As always, please report any issues to me or by posting in "Forum Software Bug Reports".
 
tumblr_oq5hd1Tzzh1w2pygdo1_500.gif
 
My fear was that the site was permanently dead due to causes unknown.

As a mentor once noted: sometimes people love you more if you screw up and fix it than if your superstardom makes it look easy.
 
When I was in high school and ran my BBS. (shut up, yes I'm old) we used to say, "real men don't back up!" My Co-sysop and I considered tee-shirts to that effect... then we re-considered... Now as a semi-rational adult who actually plans these sorts of things, not only back up, I dupe the back up off site, store archival copies off site in fire proof facilities for years on end, and oh yeah, have a disaster recovery plan.

You would shocked how many multi-million $$ networks I see that have massively redundant and complex backup systems and hardware but never test the backups, hardware, or for that matter the procedures for executing said backups.

Don't feel bad, you are one person running a complex data base, these are teams of people with massive budgets, authority, and time.

The other thing we always say, "you are not a real tech until you break something REALLY expensive. You are not a good tech, until you fix (or hide) it before anyone knows!
 
  • Like
Reactions: Merboy and Gavin
Gavin,

Two words for you. Naughty List.

It happens to the best of them. I know a very good dba that one time wrote an SQL script that reset everyone's password. No big deal. Mistakes happen. Sounds like you've done a lesson learned and will put measures in place to prevent it from happening again.

However, if it happens again, you might be strapped into a Vekoma SLC with the restraints welded shut and forced to ride for an undetermined number of rides. ;-)
 
  • Like
Reactions: Gavin and Zachary
Consider Donating to Hide This Ad