So...I screwed up.
As many of you are aware, The ParkFans Forum was down for over 36 hours. The reason? A single bad keystroke that set off a domino chain reaction; basically, I screwed up.
.
.
.
Like MAJORLY screwed up.
.
.
.
But it's all fixed now, I swear!
For a more technical answer, let's start with some background. Unbeknownst to many of you, the sites hosted by ParkFans run off of two servers. Server A handles serving dynamic web pages(like this forum thread); while Server B handles database queries and serving static content (like images). There are various reasons as to why it's set up this way that we aren't going to go into, but our focus is on Server B. I was performing some routine user maintenance on Server B, which involved editing /etc/passwd. While I was editing /etc/passwd I had somehow accidentally pasted in additional text on the line that defined the root user for the server. Unfortunately for me, I did not realize my error until I had logged out and rebooted the server.
I basically broke root...but it gets worse.
Of course, like any sane admin, we do have backups. We back up the database and server image every night automatically. The problem? Database backups were not complete, making them completely useless. Honestly, they would have been useless anyway, as they would not have had any posts for the last day. Luckily, I was able to get a full up-to-date dump of the database and the server backup was good. You would think that with all of that, it would be a piece of cake to get it all fixed.
Fat chance.
From MySQL throwing errors and just straight up refusing to import my database dump for being too big, to MyBB failing to sync up its cache, it just wouldn't work. Nothing I did could make the database import properly so that MyBB could read it. So I did what any sane person would do in this predicament.
I rewrote the database dump...all 320 MBs...by hand on a netbook.
To be fair, there was a lot of copy and paste involved. There was also a lot of trial and error. However, after going at it nonstop for the past two days, I was able to recover everything right up to the minute when it all went wrong. Or at least as far as I can tell.
Did I learn anything from this mess?
Of course I did. I learned to make sure that we have up-to-date and tested backups before I start any sort of work on the server. Another thing would be to test automatic backups on a regular basis. Proofreading any changes that I make before I commit them would also help, as well as trying out changes on a test server first before moving to production (I would need to check on the feasibility of that first).
All in all, I do deeply apologize for the inconvenience I may have caused any of you as well as the sheer incompetence on my part. I can't say for certain that it won't happen again, but I can do my best to mitigate any future issues.
--Gavin
P.S. As always, please report any issues to me or by posting in "Forum Software Bug Reports".
As many of you are aware, The ParkFans Forum was down for over 36 hours. The reason? A single bad keystroke that set off a domino chain reaction; basically, I screwed up.
.
.
.
Like MAJORLY screwed up.
.
.
.
But it's all fixed now, I swear!
For a more technical answer, let's start with some background. Unbeknownst to many of you, the sites hosted by ParkFans run off of two servers. Server A handles serving dynamic web pages(like this forum thread); while Server B handles database queries and serving static content (like images). There are various reasons as to why it's set up this way that we aren't going to go into, but our focus is on Server B. I was performing some routine user maintenance on Server B, which involved editing /etc/passwd. While I was editing /etc/passwd I had somehow accidentally pasted in additional text on the line that defined the root user for the server. Unfortunately for me, I did not realize my error until I had logged out and rebooted the server.
I basically broke root...but it gets worse.
Of course, like any sane admin, we do have backups. We back up the database and server image every night automatically. The problem? Database backups were not complete, making them completely useless. Honestly, they would have been useless anyway, as they would not have had any posts for the last day. Luckily, I was able to get a full up-to-date dump of the database and the server backup was good. You would think that with all of that, it would be a piece of cake to get it all fixed.
Fat chance.
From MySQL throwing errors and just straight up refusing to import my database dump for being too big, to MyBB failing to sync up its cache, it just wouldn't work. Nothing I did could make the database import properly so that MyBB could read it. So I did what any sane person would do in this predicament.
I rewrote the database dump...all 320 MBs...by hand on a netbook.
To be fair, there was a lot of copy and paste involved. There was also a lot of trial and error. However, after going at it nonstop for the past two days, I was able to recover everything right up to the minute when it all went wrong. Or at least as far as I can tell.
Did I learn anything from this mess?
Of course I did. I learned to make sure that we have up-to-date and tested backups before I start any sort of work on the server. Another thing would be to test automatic backups on a regular basis. Proofreading any changes that I make before I commit them would also help, as well as trying out changes on a test server first before moving to production (I would need to check on the feasibility of that first).
All in all, I do deeply apologize for the inconvenience I may have caused any of you as well as the sheer incompetence on my part. I can't say for certain that it won't happen again, but I can do my best to mitigate any future issues.
--Gavin
P.S. As always, please report any issues to me or by posting in "Forum Software Bug Reports".