By now a lot of you have probably read about the Time Warner Cable (TWC) outage on August 27th. Yesterday morning I was greeted with a slew of alarms, names that wouldn’t resolve, websites that wouldn’t load and home office employees without any Internet access. It hadn’t hit the news yet, but I could sense that a major outage was occurring and quickly opened up the ThousandEyes platform to get a handle on the situation.
Time Warner Outage
The alerts started coming in a little before 930 UTC (5:30 Eastern). I observed several different issues including inaccessible websites, DNS names failing to resolve, BGP reachability issues and agents losing access to the Internet. Companies that peer with Time Warner experienced degraded HTTP availability, affecting critical services such as supply chain portals (Figure 1).
I could see right away that users and networks that connect through Time Warner were unable to reach the supply chain portal, indicating the issue was in the Time Warner network. In this case, I’d expect to see a brief service interruption while all the traffic re-routed through their other upstream ISP AT&T. However, the availability issues continued for the entire duration of the outage. I was surprised to still see issues, so I took a look at the path visualization view to figure out exactly where traffic was getting dropped on the way to this site. Normally, two locations (Tokyo and Dallas) transit Road Runner (Time Warner) to reach this supply chain portal, while the rest go through AT&T (Figure 2).
Once the outage began, all the traffic was routed through AT&T like I’d expect, except that the Tokyo and Dallas locations never make it all the way through. Their traffic is being dropped inside the AT&T network (Figure 3). Without this type of technology you might not even be aware that many of your customers can’t reach your site because your backup configuration didn’t work correctly when a real failure scenario occurred.
I didn’t see just dropped packets and failed connection attempts. In many cases my tests couldn’t even resolve sites via DNS. Many online services that use Time Warner-hosted DNS servers also experience outages. I took a look at alerts related to a DNS test for a SaaS vendor domain hosted by Time Warner and saw that every single server was failing from every location (Figure 4).
If fact, I wasn’t able to reach the DNS servers in the first place as they’d lost connectivity from all networks (Figure 5).
The Smoking Gun
I dug a little deeper and discovered that the packets weren’t just being dropped; the problem was that all of the routes that tell us how to get to these DNS servers had been withdrawn (Figure 6). Basically the Internet no longer knew how to reach large portions of the Time Warner network.
Those Inside Time Warner Couldn’t Get Out
I also noticed that some of our Enterprise Agents couldn’t access any of their test targets. It turns out that several were hosted in locations that use Time Warner as their ISP and had lost their Internet access completely (Figure 7). Even in cases where sites didn’t depend on Time Warner directly, they still had customers and users that weren’t able to reach them.
Shining a Light on the Internet
The Time Warner outage, while only lasting a little over 90 minutes, had a huge impact and was a top news story of the day. Not only were companies peering with Time Warner or hosted by them impacted; anyone who relied either directly or indirectly on Time Warner services or had users or customers that rely on Time Warner in any way were impacted. I encourage you to try out path and BGP visualization in ThousandEyes to give you accurate and deep visibility into the next major network outage.