aboutprojectslinkslinks
 

b.l.o.g.

(blogs let others gawk)

February 9, 2014

Single point of failure (or how important is your data?)

So, this is a story I don’t tell too often but in light of some recent conversations about performing backups following the news about the Iron Mountain fire, I felt it would be insightful to share.

Back in 1997/1998 I learned a very hard lesson about data loss and the publication I Co-Edited named Game Zero magazine.

First the back story to explain how this situation ended up the way it did.

We started our web presence near the end of 1994 with a user account with a local Arizona company named Primenet who offered users the traditional array of features (WWW, POP mail, etc…). This worked out great except for a couple of problems. The first was that even though we had registered the domain gamezero.com for our site, Primenet’s server name resolution would sometimes flip a visitor’s browser to the primenet.com/team-0 URL while the person was traversing the site. This caused lots of people to create bookmarks and links to the site by the wrong URL (this comes into play later).

The second and later problem, although not a technical issue, was the cost associated with bandwidth for WWW visitors to the site. Towards the end of our time with Primenet we were hitting fees of a few hundred dollars a month for bandwidth from our 700,000+ page views a month. Fortunately we had designed our site incredibly light, so that helped keep costs low, but traffic and fees were climbing. Ultimately I set my sights to moving us to new “discount” hosting services which were becoming a thing in 1997. It was obvious we could save a significant amount of money by moving the site.

For backups, we had our production computer which housed all the original and developing web content, including the active mirror of the website and remote publishing tools as well as our POP e-mail client for all business e-mail. Additionally, we kept backups of web content and e-mails on a collection of Zip disks along with some limited content on a random assortment of floppies.

In 1997 hard drives where expensive! We’re talking a few hundred dollars for a 1GB drive. Our production PC had something like a 120MB drive, as I recall, so we had lots of data off loaded on the Zip disks.

Also around this time we also received word that the provider which had been handling our FTP based video repository was getting out of the hosting business. I decided it best to roll the video content into the new web hosting arrangement as the price would still be reasonable. We quickly migrated everything over, changed DNS entries, started sending out e-mails to people who had the old primenet.com addresses to please update their links, etc… Following the migration we only published a few major updates on the new server consisting of a couple of new videos and some articles which only existed on the website, our production system and our Zip drive backups.

Then problems started…

  1. Traffic tanked on the new server.
  2. My crawling the web looking for bad links suddenly made me aware of just how bad the extent of the linking issue was and a significant amount of traffic was still going to the old Primenet URL. Fortunately right before we closed our Primenet account we setup a root page that linked to the proper URL along with a notice about the move which Primenet was kind enough to leave up at no cost, but it wasn’t a full site wide redirect though. Just the root pages.
  3. A few months into running on the new provider their servers went dark. When I contacted them to find out what happened, I reached a voicemail that informed me that they had filed bankruptcy and closed business. Done, gone… No contact and no way to recover any of the data from the web server.
  4. We now had a domain name that didn’t respond, our old provider’s server was pointing traffic to that very same dead URL and since we had long since closed the Primenet account we had no ability to log in and change the redirect notice or make other modifications to redirect traffic someplace else.
  5. While scrambling to find new hosting, the hard drive on our production computer completely and utterly failed. 100% data loss.
  6. After getting a new hard drive I went to start rebuilding from our Zip disks and to my horror none of them would read. We had now become a victim of what became to be known as the “click of death”. We lost some 20-30 Zip disks in total. Almost everything was gone except for a mirror of the website from before the migration to the new hosting and other random items scattered around. We also had a limited number of hard copies of e-mails and other documents.
  7. Lastly, while the Internet Archive now is a great way to recover website content. At this point in time it was still just getting started and their “Wayback Machine” had only just taken a partial snapshot of our sites (in both the US and Italy). Par for this story, the lost content was pages that had not been crawled yet except for the index pages for the missing videos. I could view the archive of the video pages… but the linked videos were too large at that time and were not mirrored.

Coming into this, I felt we had a pretty good data backup arrangement. But I learned the hard way that it wasn’t good enough. We lost all of the magazine’s e-mail archives including thousands of XBand correspondences as well as innumerable e-mails with publishers and developers. We lost two videos that had been produced and published. We lost a few articles and reviews. We also lost nearly all of the “in progress” content as well as a number of interviews.

At this point the staff agreed to stop spending money on the publication and formally end the magazine, especially since some of them were already making natural transitions into their careers and school. While we had stopped actively publishing at then end of 1996/start of 1997, if you were to ask me if there was a hard line for the the true end of the magazine, this was it.

Ultimately I did get the site back up as an archive which you can still read today. But, that’s another story.

The lesson of this story is to remember that there is no fool-proof backup situation. Only you can be responsible for you (or your company’s) data and you must always be aware that no matter what your best efforts are, data loss is always a possibility.

99.9% guarantees are great except forĀ  that 0.1% chance, which is still a chance! and if someone is selling you a 100% guarantee let me know because I’ve got the title for this bridge in Brooklyn I might consider selling you for a deal.

What could I have done differently?

  1. Spread out our backups across more than one media type and one location. Simply having a duplicate set of Zip disks and a second drive off site where there was no cross-mixing would have made a huge difference here.
  2. More frequent backups of critical business data such as e-mail.
  3. Retained the master account with the old service provider until we were sure traffic migration had been completed.
  4. Upon the first sign of Click of Death observed. I should have isolated both the problematic media and drive from use and looked for a second drive as the damage propagated once manifest but nobody had enough information about the problem at the time and the manufacture kept denying the problem existed.

Granted some of these would have likely added overhead cost, but the the question is would that cost balance against the value of the data lost? I don’t know. But since this happened I have been far more diligent in my data storage strategies where I now factor in the value and importance of the data with the breadth and depth of the backup plan and go with the best possible solution I can devise.

I have had only one significant data loss in the years since this happened. It was just last fall and I was doing some data re-organization as part of a desktop upgrade. A USB drive I was using for temporary storage fell over and become damaged in such a way that it would no longer read the disk. I then discovered that the data on the drive hadn’t been synchronized with the backup repository for a couple of months for some reason. Fortunately it was non-critical, personal data (downloaded drivers and install packages that I was able to re-download from the Internet). So all in all the only loss here was in my time. But it was a reminder to me that even though I am way more careful than before, accidents can still happen.