Reflecting on Netflix, Instagram, Pinterest Downtime
If you were at staying at home last night trying to enjoy a wholesome show or movie on Netflix or perhaps you were out snapping photos with Instagram that you might later share on Pinterest then you would have quickly found out that all three services were down for a couple of hours due to electrical storms in North Virginia which cause a few “Availability Zones” to go offline but not the entire facility.
If you have a background in Systems Administration or Networking then you must be asking yourself how a major company like Netflix or a Instagram (owned by Facebook) could have such an outage. In reality these services should not have had an outage that ever lasted that long because Amazon Web Services (AWS) not only promotes steps to do High Availability Deployments but further because High Availability and Fault-Tolerant deployments is not something that is new.
More importantly why were people on the West Coast of the United States affected by an outage in North Virginia when its known to some that Netflix maintains large deployments in Northern California and Oregon? I for instance did not have service but if Netflix had used Anycast to appropriately route traffic to the nearest data center much like Cloudflare does with their service then I would have had little or no disruption to my movie watching experience.
I think these companies need to invest some time into considering what went wrong and how to make sure it does not occur again or at least that they have something in place to make such and outage so minimal that their stock prices are not affected come Monday morning. A good place to start would be looking at technologies like Quagga (BGP Anycast Routing), Ifenslave, HAProxy (High Availability Load Balancing), and perhaps Heartbeat and Pacemaker. (All available on Ubuntu 12.04 Server LTS which is on AWS)
In closing I applaud Twilio and the other services out there that had competent people working for them to make sure their infrastructure and services fault-tolerant and prevented downtime.
-
Stephan Adig
-
http://benjaminkerensa.com/ Benjamin Kerensa
-
Hosting & CDN Powered By:
Get Updates!
Subscribe for updates on my latest blog posts!
Listen to Open Source Music!
Featured YouTube Video
Reading or Read
FEATURED IN

Tags
12.04 12.10 2012 Amazon Android Apps Birthday Blogging Canonical Chuck Cloud Community Dell Facebook Firefox FOSS Google+ HP iPhone Linux LoCo Marketing Mozilla music Netflix Occupy Portland Occupy Wall Street Open Source Oregon Politics Portland Privacy Quantal Quetzal Redhat Sharing Social Spotify Sprint Twitter Ubuntu UEFI Video WebFWD WebOS Windows





