The WarLight.net website is experiencing an outage (EDIT: It’s back up!). I’m working on getting it back up as fast as possible. More details will be posted soon. Sorry for the inconvenience.
UPDATE 10pm GMT-8: The main WarLight database has suffered a hardware failure. Rackspace engineers are currently working to recover the data onto a new server. As a backup plan, I am also restoring the latest database backup onto a fresh server, just in case Rackspace is unable to restore the old server. The latest backup I have was taken about 4 hours before the outage, so I’m hoping it does not need to be used.
UPDATE 5am GMT-8: I’m really mad at Rackspace. For the last ten hours all they’ve given me is “We can’t give you an ETA” over and over. Finally, they tell me it’s 95% done being restored and will be up within 2 hours.
UPDATE 6:49am GMT-8: The website is now functional again. Unfortunately, there was a few hours rollback in the database. Hopefully players who happened to commit turns within that period will be respectful and try to commit the same orders they played the first time. I’m going to be taking steps to ensure this can’t happen again.
Additionally, for the next several hours, booting in all games will be disabled (you’ll get an error message if you try to boot.) This will allow players who couldn’t take their turn before the boot timer to get it in before getting booted.
WarLight’s database has had hardware failures before, however Rackspace typically will automatically move your server to a fresh machine with only a few minutes of downtime. This has happened several times, actually, and it’s never been a problem before now. This time it didn’t work for some reason, and to make matters worse, their support personnel were very unhelpful. I had gotten complacent on relying on Rackspace’s automated failover. Unfortunately, when the failover finally did happen after several hours, the database was corrupt beyond repair and I had to go with my own backup anyway.
To ensure this can’t happen again, I am going to set up a continuously streaming backup system that will backup the database several times per minute. This way, if there’s another failure, I don’t have to rely on waiting for Rackspace’s failover and I can restore from my own backup without significant data loss. Previously, the backup was only made every few hours, so this is a significant jump in reliability.
I realize the problems that outages cause, and I’m probably more disappointed by this than any of you. I take this very seriously as it’s my full-time job and I am going to take several steps to ensure this can’t happen again.
UPDATE: As mentioned above, booting is temporarily not allowed in any games. This gives a chance for players who are over the boot time in their games and couldn’t play due to the outage to take their turns. I realize this makes it difficult to play new real-time games, but please bear with me as we get everything back to working normally.
UPDATE: Booting is now re-enabled.