Amazon Web Services power outage took hundreds of sites offline

Posted Mar 6, 2018 by James Walker
Amazon has confirmed that a datacentre power outage affected scores of online services including GitHub and Slack. The incident occurred almost exactly a year after a similar failure in the same region of Amazon's cloud. Service was quickly restored.
Amazon has announced the shortlist of cities that could host its second North American headquarters ...
Amazon has announced the shortlist of cities that could host its second North American headquarters, known as HQ2. The company received 238 proposals from locations across the region. It's now narrowed the list to a total of 20 cities, including Toronto. Toronto is the only location outside the U.S. to have made it to the shortlist.
As reported by Computer Weekly, the unplanned downtime affected Amazon's US-East-1 region last Friday. The service hosts thousands of web services for consumers on the U.S. east coast, so prolonged outages have a knock-on impact across the Internet.
In a statement on its service status page, Amazon confirmed that customers encountered "Internet connectivity issues" while it worked to restore the power. Two separate power loss incidents occurred, with the second coming a couple of hours after the impact of the first was mitigated. Each power cut lasted for around 10 minutes.
Web applications known to be impacted include Atlassian, GitHub, MonoDB, Slack and Zillow. Users of these services, as well as many others hosted by Amazon, may have found they were temporarily offline during the incident. Amazon's own Alexa digital assistant also went quiet, failing to respond to voice commands or fetch data from its online backend.
READ NEXT: Cisco expands Tetration analytics platform
Companies that use Amazon's Direct Connect datacentre integration service were affected too. These firms rely on Direct Connect to obtain a direct link between their own datacentres and Amazon's public cloud. Network monitoring firm ThousandEyes estimated over 240 major services were impacted in total.
The incident serves as another reminder of the risks of cloud centralisation. Over the past few years, web service providers have flocked to public cloud platforms such as Amazon's Web Services. While this can improve reliability and performance, any datacentre issue could knock hundreds of different services offline. ThousandEyes warned that outages can "quickly ripple over" to other cloud infrastructure.
"Outages and natural disasters in one part of the cloud can quickly ripple over into other areas," said ThousandEyes. "Cloud vendors offer several ways to directly connect into their infrastructure. However, they do not make you immune from the external dependencies of the Internet. While availability zones offer some level of redundancy, regional outages like these can quickly envelope entire clusters of data centers."
Amazon restored the service by 14:20 UTC on Friday. It told CNBC that analysis of the incident confirmed a power loss at one of the company's Virginia Internet connection points was responsible for the disruption. The company is working with an unnamed partner to prevent a similar incident occurring again.