Earlier today, beginning at 9:20 AM PT, Comcast suffered a network outage on its backbone that left millions of Internet users in the United States—even beyond their own subscriber base—unable to reach critical sites and services. The outage wasn’t contained to a particular region, as Internet users from Seattle to New York were affected. The significance of this outage was not only in its massive scope but also in its length, which stretched for nearly three hours—effectively an eternity in Internet timescale. Any ISP peering with Comcast and sending traffic through its network would have effectively broadened the scope of the outage.
While the root cause of the outage was contained to Comcast’s network, any Internet user whose traffic transited through the Comcast backbone would have been impacted. Comcast has peering relationships with many different ISPs, and traffic traversing the Internet may change hands simply based on the relationships providers have with one another. This complex chain of interconnections and dependencies means that any outage or performance degradation in one ISP’s network can have a ripple effect on the customers of other ISP networks.
In Figure 1, we can see many different ISPs peering with Comcast and just a few of the routes and services that were impacted as a result.
Comcast’s Xfinity media platform was also impacted. Many users attempting to connect to Xfinity from within and outside the US would have been unsuccessful, as we can see in Figure 2 below.
Comcast Announces Root Cause of the Outage
At 11:37 AM PT, Comcast announced the root cause of the outage to be a fiber cut and very shortly after the announcement, network service appeared to be restored. By 12:20 PM PT, the Xfinity site was also fully available for most users.
Just before the announcement by Comcast on the fiber outage, ThousandEyes could already start to see steps being taken to remediate the problem and restore service for the Xfinity platform.
Comcast Reroutes Traffic Around the Outage
It’s generally a good idea to announce fewer aggregated IP prefixes to the Internet, in order to keep the size of the global routing table manageable. In the case of Comcast’s Xfinity service, the data centers were reachable via two /13 subnets—184.108.40.206/13 and 220.127.116.11/13—which are originated by AS7922, and widely announced to all of Comcast’s peers. While this helps with controlling the route table size, the challenge with large subnet blocks is that it limits an ISPs ability to steer specific traffic flows. Any routing policy changes will affect all of the traffic to and from the ISP. Another approach to rerouting traffic is to break down address blocks into smaller subnets, which can be individually steered toward specific regions of the backbone. This is the approach Xfinity took when at approximately 11:00 AM PT they announced a more specific subnet (18.104.22.168/20) originating from AS36733, which is also owned by Comcast, and the service was restored. At about 1:15 PM PT, this specific route was withdrawn and service still remained intact, which indicates that the original problem within the Comcast backbone was likely resolved.
The Internet as a Web of Hidden Dependencies
To most users, the Internet is either a black box or else viewed like a utility, where the provider a customer contracts with is the same one that handles their service end-to-end—but that’s not how the Internet works. The Internet is made up of thousands of autonomous networks that are interdependent on one another to deliver traffic from point to point across the globe.
The lesson to be learned from this outage is that the Internet is made up of many hidden dependencies, any of which can impact your ability to connect to sites and services—even if you don’t have a direct relationship with an affected ISP. Particularly for businesses, these hidden dependencies can pose a significant risk, which is why visibility into Internet connectivity and performance—and which networks your traffic is touching—is so critical.