Route Leak Causes Amazon and AWS Outage

Posted by on June 30, 2015

There was quite a bit of chaos on the Internet today, including major fiber cuts in California. To add to this confusion, between 5:24pm and around 6:10pm Pacific on June 30th, social media and outage reports indicated some issues with Amazon, AWS and a variety of services that run on AWS. In our office, we realized HipChat (our internal messaging system) and Okta (our SSO provider) were not working. And neither was our corporate website, which is hosted on AWS EC2 and fronted by AWS CloudFront.

Given the known fiber cuts, there was some speculation on the Amazon issues being related to that, so we decided to take a quick look and do our own investigation while this disruption was going on. We received internal alerts that many customers were being impacted by loss in two specific networks. After a few minutes of intense analysis, we found that the root cause of this was not related to the fiber cuts, but in fact a route leak from Axcelx (AS33083), a data center provider in Boston. All of Amazon’s prefixes originating in AS14618 were affected to some degree.

Figure 1 shows routes under normal conditions from our cloud agents in Dallas and New York to Tinder hosted in Amazon’s data center. Expected ISPs consistently seen in the path to Amazon during normal operations are Level 3 and Zayo.

Figure-1
Figure 1: Under normal conditions, these paths traveled through Level 3 and Zayo directly to Amazon’s data centers.

During the outage though, as seen in Figures 2 and 3, the network view shows loss at Hibernia and Axcelx, two networks that were never in the path before; definitely suspicious.

Figure-2
Figure 2: Terminal routes destined for AWS in Hibernia’s POP in New York.
Figure-3
Figure 3: From within an Axcelx data center, traffic never made it out.

So we looked at the BGP data to see if there was any change in the control plane and not surprisingly as seen in Figure 4, saw significant activity on BGP and the appearance of Hibernia (AS5580) and Axcelx (AS33083) in the BGP paths all of a sudden.

Figure-4
Figure 4: Routes destined for Amazon (AS14618) in green with routes via Hibernia (AS5580) and Axcelx (AS33083).

The forwarding loss combined with the sudden appearance of these two ASNs in the BGP paths strongly suggested a BGP route leak by Axcelx. Looking at the raw BGP data showed the exact BGP updates that resulted in this leak.

To interact with this data before, during and after the outage, check out these interactive links:

All in all, the route leak affected a wide range of services including consumer internet sites like Yelp, Netflix and Match; SaaS services such as HipChat and Jobvite; and financial firms such as Experian and Zions Bank.

Check out how to discover route leaks or sign up for ThousandEyes to monitor your service for free.

Processing...
  • Jonathan Crane Maryland

    This is amazing investigative work. Thank you.

  • pmastin23

    Great work guys!

  • Alfredo

    Good Job!