Tuesday, January 1, 2013

Part 1: NAS and Backup Strategy

"Never let a good crisis go to waste."

People I work with have heard me utter that phrase many times. For instance, when that production server decided to come down, it's especially frustrating when you know there was some action you could have implemented to avoid the crisis all together: unless we do X, Y might happen.  X is usually a minor inconvenience/expense up front, almost always less than the potential Y, but X is never mandatory. Businesses go through these quite frequently, and it's no different in our personal lives.

The reality is, that in many cases, a crisis is the only way to change behavior and improve things for the better. It's funny how life works like that. 

Consider the following:
  • A burglar breaks into your house and steals your valuables.
  • Your basement gets flooded, your house burns down, etc. 
  • Your hard drive crashes and you didn't have a recent backup.
All of the above are unpleasant situations (of varying degrees, certainly) that usually result in not-so-fun consequences. 
  • "I should have installed that alarm system" ($40 / month vs. priceless things gone forever)
  • "Why did I wait to replace that sump pump?"  ($500 vs. $1000's in clean up costs)
  • "I wish I would have been more faithful about backing up my system" ($xx vs. priceless things gone forever)
Sadly, I speak from personal experience with the above items (with the exception of being burglarized). By nature, I am a planner as well. I like to do as much as I can to avoid a crisis. "Be Prepared" as the old Boy Scout motto says.

I've decided that 2013 needs to be different, so this year I'm trying to address a few things on the consideration list. The focus of this point forward is a follow up to a recent post: my backup strategy for 2013. And yes, I am somewhat reacting to a crisis, but fortunately that crisis was averted.

Backup Strategy

My backup strategy is fairly simple. In concept, I am going with the "defense in depth" approach. Wikipedia does a good job explaining the concept: the purpose is to "... defend a system against any particular attack using several, varying methods. It is a layering tactic, conceived by the National Security Agency (NSA) as a comprehensive approach to information and electronic security"

In the context of data, the "attacker" is not necessarily a person, but the construct still applies. My assumption is not a matter of "if" a hard drive will fail, but "when" it will fail. Since I've been using PC's with  hard drives since the late 80's, I've witnessed more than 5 hard drives physically bite it. This has been for a variety of reasons: lightning strikes, power outage/brownouts, old drives eventually giving out (ie, "click of death"), and within the last few years, a few drives to plain old crappy manufacturing / poor quality.

My strategy is thus quite simple: create many backups (>3) of stuff that is important, and have at least one of those backups off site. 

On a related note- if you're looking for some good write ups on backup strategies I recommend reading Scott Hanselman. Scott always has a lot of good tips and I've been following him for years. His articles are straight forward, to the point and make a lot of sense. Some of my tactics are derived from Scott's tips.

Tactical Approach

The strategy is pretty simple, but the tactics are where the rubber meets the road. There are SO MANY OPTIONS to choose from. Before I get into my specific plan, I'd like to give a little background into my thought process which will be in the form of recommendations.

First, think about your overall current state context. 
  • Type of devices you want to back up, thinking by OS if a computer is usually helpful
    • PC's
    • Mac's
    • Linux
    • Phones
    • Tablets
  • Use of Data
    • Small sized but critical data that you  frequently work on day-to-day and typically resides on your computer: source code (if you're a programmer), documents, spreadsheets, etc.
    • Large sized local critical data (if you're a creative type)- HD video WIP projects, photoshop files, etc. 
    • Stuff you want to share, such as videos, photos, etc with other devices eg Apple TV, Media Center, other PC's, etc.
    • Less important but sentimental things that just need to be archived somewhere (old emails, old work files)
    • Stuff that comes and goes (eg. DVR recordings from Media Center)
  • Priority of Data / Data Equality
    • High Priority: Stuff that is super important that you can't replace: family photos, videos, etc. 
    • Medium Priority: Stuff that you could rebuild if you had to, but would prefer not to (your OS install, program installs, rips of DVD's or Blu-Rays you own, iTunes/mp3 purchases)
    • Low Priority: Stuff that would be a minor headache if you lost: ISO disc images of programs, your downloads folder, DVR recording
  • Current Data
    • How much data have you amassed?
    • What extent of this is "high priority" data that needs a defense in depth approach to backup?
A good tactical plan should cover growth for at least the next 2-3 years. You may not have all the answers to the above, but it helps thinking through it. I spent a few hours sketching out the different types of data that  I have. I spent a week pondering my options, and several days researching once I had narrowed it down. Don't make a hasty decision- your data is at stake!

Now, on to the implementation.

As you might expect, one could spend a small fortune on implementing a backup strategy. Many corporations do this! However, most of us do not have the deep pockets of a corporation. I'm glad to say that there are still some good options out there for just about any budget.

I've already invested a lot into professional camera equipment. It is a passion, hobby, and much needed creative outlet- I film weddings, church events, kids sporting events, dances, and home videos to name a few. If I had set aside even half the money I've spent on my camera equipment I'd be able to rival the backup plans of most small businesses. However, I didn't do that- refer back to my opening remarks. :) Fortunately, my implementation will allow me to grow and adjust over time.

My overall implementation will look like this:
  • NAS (NOT a backup, but has redundant storage)
    • Choice of Nas: FreeNAS  
      • an Open Source software built on FreeBSD, for my SAN.
      • FreeNAS will allow me to use ZFS, which is a software based RAID. 
      • I will be using a RaidZ1 setup with 5 2 TB drives. RaidZ1 is very similar to Raid 5, meaning I will have good performance and good reliability. If one drive goes bad, I won't lose my data. (RaidZ2 is even better but higher cost / lower performance- two drives can go bad and you won't suffer any data loss)
  • The NAS will:
    • Store backups of all priority data across my network (with the source machines having my primary copy)
    • Store copies of backups for my Mac's 
    • Will serve as a work in process location for my video editing
    • Will be the primary storage used for video sharing
    • Should have steady performance of around 100 MB/sec (megabytes) read and write, with expansion up to 200-250 MB/sec using Link Aggregation
      • Side note: I need high performance because of my WIP video editing needs.
  • Mirrored 1 TB "poor man's backups" 
    • I will continue to copy the high priority stuff including photos, raw videos, finished videos to this drive.
    • Goal: discipline myself to copy my final videos to my dual backup until I automate my workflow to backup the high priority stuff from NAS.
    • In 3-6 months I will upgrade my mirrored 1 TB drives, send one of them to my parents (for off site backup) and upgrade to dual 3 TB drives.
  • Time Machine on Time Capsule
    • I will continue to use my Time Machine for backups of the Mac's we have
    • It is a 1.5 TB drive. It keeps incremental history and I can go back quite a ways
    • Since there is no disk redundancy, I want to back this up to the NAS as per above in case I lose it.
  • Blu-Ray
    • For archiving the high priority stuff
  • CrashPlan Cloud Storage
    • Because fires and floods happen, off site backup is required.
    • For about $120 or so a year, I will backup my high priority files to the cloud (active backup)
    • ALL data is encrypted
    • There is an option to "seed' the backup with 1TB of data for additional cost, otherwise it may take several weeks or even months to backup all my data.
    • If I want to recover, I can pay a fee to get 1TB of data sent to me.
    • If my family / friends sign up, I can store backups on their machines (based on what space they make available) if I so desire. I believe this is also encrypted. Another cool option for redundancy. I would gladly open up some space on my SAN for other friends / family willing to trade space.
  • SmugMug (also Cloud storage and paid site for photos/final videos)
    • I've been using this site for a little over a year. It's great for photo sharing and even videos, though there is a 20 minute / 8 GB limit on videos.
    • I don't put everything on this site, mostly stuff I want to share with others.  
    • Allows me to download all my photos in one zip if I ever need them.
    • Unlimited video / photo storage.
    • It's about $150 a year but price seems to go up frequently. 
  • A "Restore" Roadmap
    • On a related note,  my wife and I are going to set up a Will / Trust this year. It's not fun to think about, but you need to consider that someone else many need to recover the data. 
    • With all the trouble I'm going through to set this up, it would be a tragic waste if I didn't consider this.
  • APS UPS Power Supplies
    • In the event of a power outage:
      • I want to make sure my networking components, including Time Capsule is available.
        • I will use a APC 450 VA for this. I should have about 30 minutes of availability, which should allow enough time for things to shut down properly.
      •  My NAS needs to be on a UPS.
        • I will use an APC 750 for this.  Since the NAS will be headless, I anticipate about 30 minutes backup. FreeNAS can be configured to auto-shut down if the UPS signals that the power is out after X amount if time. I will be using this feature.
Why FreeNAS? Many reasons, but to summarize:
  • I looked at Synology, QNap, Buffalo and many other vendors. 
    • While the idea of just buying it and getting done with it appealed to me, the value proposition wasn't quite there. FreeNAS offers more features than most out-of-the-box. 
    • In my research, I also came across many forum posts (and even Amazon reviews) who had gone with a vendor device and ultimately decided to go FreeNAS because they had issues.
    • Performance also seemed limited (less than 100 MB/ sec) due to the crappy system specs in these units (ATOM processors, 512 GB Memory). Some people upgraded them and saw improvements, but lost the warranty.
  • Software Raid (ZFS) was designed with data integrity from the ground up. In most cases it's just plain better than hardware RAID. Much is written on the subject
  • RaidZ1 allows me to avoid the "Raid 5 write hole" while having performance and redundancy of traditional hardware Raid 5. (This basically means I should avoid corruption due to a power loss)
  • Overall price vs. other vendors seems to be slightly better especially when considering future growth.
  • There will be some learning curve, but the folks on the FreeNAS forums seem very helpful so far.
  • I really enjoy tweaking and computing is still a hobby for me. While some may shy away, this is an exciting challenge for me and I look forward to tackling it.
In Part 2 I will cover the specific hardware and installation of the FreeNAS. Stay tuned!

Adventures with SSD's, Hard Drives and Backups (and Happy New Year!)

The holidays were quite busy in 2012. I took a bit of a break from my blog accordingly. Here we are in 2013 now, it's hard to believe! Happy New Year!

I last blogged about some key upgrades that I made to my 2011 17" MacBook Pro- including a 512MB OCZ Vertex 4 SSD and 16 GB of Ram. While my system drastically improved in terms of performance, the reliability factor seemingly dropped. Prior to the upgrade I can remember running "uptime" (terminal command) and seeing a time of 70 days- meaning it had been that long since my system had been rebooted or shut down. That is some rock solid reliability! Post upgrade, I was getting occasional lockups and system freezes primarily when browsing.  I was lucky to get 3-4 days of up time without a freeze.  Not so cool. Disclaimer: I upgraded the components myself and did not take them to the "Mac Cafe" (local Apple screwdriver shop that is an authorized Apple dealer). I do have Apple Care but from what I understand, you don't void your warranty if you do these simple upgrades yourself (so long as you can put the original stuff back in if you have any issues).

Anyway... the week before Christmas, everything came to a head. My SSD decided to bite the dust after apparently one too many crash.  Timing couldn't have been much worse, as I had a wedding that I was going to be shooting the Saturday before Christmas (Dec 22).

Other than the occasional freezes, the symptoms of my machine leading up to the crash came on suddenly and without any real warning. I restarted my MBP and I couldn't get past the grey screen with a progress bar. In fact, the MBP would shut down after about 30 seconds. I quickly went to Apple's support site and found out that this issue indicated a bad hard drive. A lot was going through my mind at that point:

  • What all am I going to lose?  
  • Will my backup will really work? 
  • What about all my videos and photos that only live on my MBP that I hadn't yet backed up...

Then I quickly got a little frustrated:

  • This drive is only 3 months old!
  • I'm never buying another OCZ product again...
  • WHY ME?? WHY NOW??

My mind was racing. I was subconsciously calculating the gigabytes of space of HD videos and Photos that I had done since the switch over from my previous Seagate Momentus XT drive. In my panic, I had forgotten that the majority of my raw video footage (at least) was stored the other HDD that I put in my Superdrive bay.

Then I remembered back to similar crash a few years ago. My 500 GB Seagate Baracudda drive "gave up the ghost"- which also happened to be the exact day of our wedding anniversary (Sept. 4th). The drive itself was less than 6 months old. Apparently the batch of HDD's that mine came from had an issue with Teflon coating coming off the heads of the drive, which eventually rendered the drive useless. I had several hundred GB of home video footage on that drive. I lost footage from some really cool video short movies that I made for a church youth group lock-in. Fortunately I still had all my mini-dv cassette tapes, so I was able to re-import those (which took about 3 full days!).  I could have used a hdd recovery service but they wanted almost $1500... just not quite worth it for what I lost.

Fortunately, I (somewhat) learned from that mistake and purchased 2 1 TB external Seagate drives and started a poor-man's backup with some custom written batch files. I started backing up all my photos and videos to this drive and still am using this solution today. But that solution was mainly intended for my PC, and not my MBP...

Fast forward back to my current dilemma.  I decided to immediately pull the SSD from the MBP and test it externally to see if it was the issue, or if my MBP had another problem. Fortunately I still had my previous HDD laying around in an enclosure. I popped the old drive out, set it aside, and then placed the SSD drive into said enclosure. I connected the drive from my Windows 7 machine and no dice. I also tried connecting the SSD with a USB to Sata cable... nothing. The SSD was definitely kaput.

The truth settled in. My SSD was *toast*. I might have lost my videos and/or photos. I might have lost programs and applications that I had installed.Final Cut projects of some WIP videos.  Ugh... I didn't know for sure that I was going to be able to get my data back. HOW COULD I LET THIS HAPPEN AGAIN? Well, maybe it wouldn't be so bad- after all I was using backups, but wasn't sure how reliable they would be. No time like the present to find out!

I quickly googled to see how to restore a system from my Time Machine backup, which was my best shot.  My "plan B" would be to utilize my HDD from prior to upgrading. My "plan C" would be to utilize my backup prior to installing that drive, which would have been basically how I ordered the machine. I found an article on Apple that was very straight forward. However, there was a slim possibility that I wouldn't be able to restore my system if I had chosen to exclude system files.  I didn't remember what I had excluded at that time.

I popped in my previous Seagate Momentus XT drive in the main HDD slot. Viola- success! No more grey screen with progress bar. That confirmed that the SSD was the issue, and deal with that later. Of primary importance was getting my system back.

Per the Apple instructions, I held down Command-R during boot. It found my Time Machine backup, which was stored on my Time Capsule. I quickly came to another dilemma because I only had this drive on hand  (which also contained my system prior to upgrading to the SSD). I would need to "have faith" in the Time Machine backup or settle for my system state as it was when I upgraded several months prior.  In retrospect, I made a pretty hasty decision. I probably should have went to Best Buy and purchased another Seagate Momentus XT ($120 or so). Nevertheless, I decided to "have faith" and wiped my backup drive in order to restore.

The time machine initially indicated it would take about 6 hours to restore over the network.  Luckily, I have a 1 gigabit network. Overall there was 400+ GB that needed restored.  I was a Nervous Nelly for the next few hours as the process went on. After about 2 hrs, the time machine restore was complete. I was immediately concerned, thinking that perhaps not all my data was there. From 6 hrs down to 2? Why? The skeptic in me immediately thought not all my data was going to be there.

After the Time Machine completed the restore, the MBP booted up.  The first thing I noticed was the spinning wheel- which I hadn't seen since my SSD upgrade. I did a few checks and immediately saw an application that I had recently installed on my task bar, so I knew that my backup was current. I then went to check my video folder.  Everything was there! That's when I remembered that my videos were on a separate disk, so there was really nothing much to worry about.  Next was checking iPhoto.  When I launched the application I was a bit concerned. Apparently, the thumbnails do not get backed up, because it had to generate thumbnails for all my "events".  I cancelled out of that and went to browse my events... Fortunately, everything was there as well.  The "last backup" time on my MBP was visible in the top menu bar, and it read about 1 hr before it crashed.  CRISIS AVERTED!

As you can see, there was a lot of drama in my SSD fiasco. I'm not a big fan of drama, especially when it comes to my own data. While I am very thankful for my Time Machine backup, I still feel a little exposed. I have countless hours of video footage and photographs that I do not want to lose.  After tweeting about some of my issues, I began exploring better backup options. @kpett pointed me to a nice article from @shanselman. I like his philosophy: make 3 copies of anything you care about, use 2 different formats, and have an offsite backup.  I am going to adopt a very similar strategy.

An old but wise SysAdmin once told me- "Never let a good Crisis go to waste". I don't enjoy invoking this mantra in most situations, as I'd rather avoid the crisis all together if possible. But there is some truth to it: a crisis is often the catalyst that can lead to a change in behavior.

Consequently, I'd like to end on this note: I have decided to make a New Years Resolution. I'm going to have a solid backup strategy for 2013. In my next entry, I'll share my massive overhaul plan (currently in process) for my new backup strategy.

A few other comments to tie up some lose ends: I still experienced system freezes after replacing the SSD with the old HDD.  I decided to run some memory tests. Everything checked out. I came across a few articles that suggested  swapping the memory slots. Since I've done this, I'm going on 5 days with no freeze ups. That doesn't really make a whole lot of sense, but it seems to have corrected the issue- pretty much a new record in uptime since my upgrades.

As far as the SSD goes, I shipped the drive out on Dec 19th. I'm still waiting for the RMA. My machine is painfully slow again, but I'm still able to be productive. Cash flow is a little tight right now, but I may purchase another 512 GB SSD to have on hand (probably from another manufacturer) in case I have another situation.  The SSD has a 5 year warranty on it.