Starting at around 1 pm ET today, Amazon’s S3 storage solution began seeing high error rates out of US-EAST-1. Web sites and users across the US experienced outage issues with sites, both large and small. Included in the list were sites like Medium, Slack, Sprout Social, Adobe’s services, Flipboard, Quora, Business Insider, Netflix, Reddit and even the Securities and Exchange Commission.
With almost half of the AWS’s million clients using the storage solution, it’s not surprising that the outage has been felt so significantly. While some only used the service for image storage, other organizations use S3 to host their websites. The service reportedly stores 3 to 4 trillion pieces of data.
Amazon is working diligently to remediate the problem, but with their own service dashboard using S3 to store their status images, it was difficult for a while to understand what services were up or down without diving into specific service updates.
Outages like the one experienced today are rare, but because so many high-profile companies use AWS, it becomes very apparent when problems occur. Such issues are the reality of IT and servers, whether public or private. The expectation that a single service will have a perfect uptime record is unrealistic.
With that in mind, companies with mission-critical applications that require high availability should consider replicating your applications or sites across Regions.
AWS distributes their data centers into Regions, which are physical locations. But in addition to Regions, AWS has created Availability Zones, which are separately housed, discrete data centers located in the same region. These data centers have redundant everything – power, connectivity, and networking – to make them as fault tolerant as possible.
But for those who need additional fault tolerance, AWS offers the ability to replicate your data in different geographical regions. You retain control of the instances regardless of physical location, which allows companies with local compliance and data residency restrictions to manage those aspects themselves.
While an AWS outage is annoying, it’s important to remember that Amazon has one of the best uptime ratings of any of the cloud providers. Downtime is a reality in any server environment, but there are strategies, like multi-region architecture, to ensure a more consistent uptime experience.