The DDoS Attack on Dyn’s DNS Infrastructure

Posted by on October 25, 2016

On Friday, October 21st, a series of large-scale DDoS attacks were launched against Dyn, a managed DNS provider. Over the course of 18 hours, these attacks disrupted many critical services–including Spotify, Amazon and HBO Now–all of whom are customers of Dyn. At the height of the attack, approximately 75% of our global vantage points sent queries that went unanswered by Dyn’s servers. In addition, the critical nature of many of these affected services led to collateral damage, in terms of outages and performance impacts on sites that are only tangentially related to Dyn (including this blog).

As we currently know, the Mirai botnet of hacked and poorly protected consumer devices was one of the sources of the DDoS attack on Dyn. Dyn reported that tens of millions of IP addresses were involved; this implicates a smaller number of devices, given that many devices likely had dynamic IPs on consumer Internet connections. We still do not know the specific target; Level 3 has reported that much of the attack traffic was a TCP SYN flood.

While this topic has already been extensively covered, we’re adding our view of the attack to help straighten out the facts and broaden the conversation. For reference, we’ve covered previous DNS DDoS attacks on NS1, UltraDNS and the DNS root.

The Importance of the DNS

First, a quick reminder of what the DNS is and why an attack on the DNS can be so devastating. The Domain Name System (DNS) translates domain names (like www.hbonow.com) to IP addresses and vice versa. This translation, or mapping, is not static over time. DNS is critical to the ability of large-scale Internet services to send user traffic to their nearest data center or to switch traffic to a different server on a whim. Dyn is popular especially because of its Traffic Director features which shape traffic flows for geographic load balancing.

The DNS is the first step in the process when a user accesses a website or API. If a fresh DNS record is not located in the user’s cache, the OS will set off a recursive search to find the IP address for the domain in question. The search ends at an authoritative server that provides the ‘authoritative’ answer to the user’s query. Dyn is a service that runs authoritative servers on behalf of its customers. Therefore, when Dyn is inaccessible, the DNS records of its customers are also inaccessible, and their sites will become progressively unavailable as the time to live (TTL) of their DNS records expire.

Timing and Scope of the Attack

We’ve seen conflicting reports on the time scale and scope of the attack. All of our data at ThousandEyes, however, points to near identical impacts across a diverse set of roughly 1200 affected sites and services that our customers were monitoring at the time. We saw the impacts of the attack in three phases:

Phase 1: Initial impacts centered on the U.S.

4:15am-6:40am PDT (11:15-13:40 UTC)

A large majority of DNS queries to Dyn were unavailable from the U.S. East Coast, expanding to portions of the U.S. West Coast and Western Europe over the time period.

Phase 2: Global impacts

8:55am-1:40pm PDT (15:55-20:40 UTC)

Between 25-75% of queries to Dyn went unanswered, affecting users in all geographies sending queries to all Dyn data centers outside China (we did not find that the domains in our tests were served from China). Figure 1 shows the global network layer impact.

Figure 1
Figure 1: Packet loss from global locations to one of Dyn’s DNS servers at 1:10pm PDT.

Phase 3: Attack mitigated

2:25pm-1:10am PDT (21:25-08:10 UTC)

Most services were no longer impacted due to effective mitigation, though the attack continued on Dyn data centers across Western Europe and the U.S. Figure 2 shows the packet loss that continued to varying extents.

Figure 2
Figure 2: Packet loss to one of Dyn’s DNS servers shows a common pattern, with large-scale issues until mid-afternoon PDT
and lingering attacks and/or blackholing of traffic until the following morning.

The Attack Affected a Who’s-Who of the Internet

As noted earlier, Dyn is a popular and well-respected managed DNS provider with many high-profile customers. Dyn customers impacted included SaaS companies (AthenaHealth), social networks (Twitter), music (Spotify), media (CNN), gaming (Sony PSN), advertising (BlueKai) and consumer products (Red Bull).

During a one-hour period we found over 1200 impacted domains among the services our customers were monitoring. This is certainly an underestimate of the scope of the impact, but does reflect the scale of critical services involved. Figure 3 shows an example from AthenaHealth; note that DNS is the first critical phase of a web connection.

Figure 3
Figure 3: AthenaHealth, a SaaS application, shows widespread loss of availability during the height of the outage.

The following links show data from several of these services. In each, you’ll notice a similar timing of DNS failures.

Because Dyn is popular for supporting geographic load balancing (providing DNS answers that vary depending on the user’s location), many of Dyn’s customers use it exclusively to host their DNS. When you register your domain name with a registrar, you have the ability to specify the authoritative name servers you want to use. Often you have 4-6 name servers you can list. Most of Dyn’s customers were only using Dyn’s name servers, rather than diversifying across multiple providers. Because they didn’t have a backup DNS provider to fall back on during the DDoS attacks, these customers were the most vulnerable to complete service unavailability.

In contrast, Amazon.com has multiple DNS providers: UltraDNS and Dyn. While Amazon was affected by slow load times, it did not suffer the same unavailability issues as many of Dyn’s other customers. Figure 4 shows what name server (NS) records look like when there are multiple providers.

Figure 4
Figure 4: Amazon.com has multiple DNS providers.

How the DNS Infrastructure Was Impacted

Dyn runs 20 data centers around the world for a combination of both free and paid managed DNS services. We saw impacts in 17 of them, all but Warsaw, Beijing and Shanghai. Within these data centers, Dyn maintains two ‘constellations’ of name servers (NS1, NS3 in one group; NS2, NS4 in another) that are intended to be isolated to failure.

Dyn’s DNS service uses anycast, where a single IP address is simultaneously announced from multiple data centers and servers. Each of the constellations shares IP addresses and routing prefixes, meaning that they share peering connections and routes across the Internet. It also means they share congestion during a DDoS attack.

  • NS1: 208.78.70.0/24
  • NS2: 204.13.250.0/24
  • NS3: 208.78.71.0/24
  • NS4: 204.13.251.0/24

In the examples above, the name servers (e.g. ns1.p37.dynect.net) reveal the constellation as the name server ‘groups’ specific to a domain (p37). We saw highly correlated performance for name servers in the same constellation and group. In addition, while performance was not perfectly correlated between name servers in different constellations, in this DDoS attack we did not see enough failure independence to ensure the availability of at least one of the name servers.

In most cases, DNS queries couldn’t make it through Dyn’s ISPs (primarily Level 3, Telia, NTT, Cogent and Telstra) or through the edge of Dyn’s network. In Figure 5, we can see how some traffic made it through to the DNS servers on the right (note that the anycast IP is actually multiple servers) while other traffic fails to reach the destination.

Figure 5
Figure 5: Vantage points on the left send DNS queries to Dyn’s DNS servers in Seattle (in green on the right).
At this time periodic traffic destined for Dyn’s San Jose, Chicago and Dallas data centers terminates in the Telia network (red circles on right).

This lack of network connectivity, in combination with the high load on the DNS servers, contributed to very low rates of DNS server availability. Figure 6 shows the DNS server availability of a major web service.

Figure 6
Figure 6: Availability of the 4 name servers for a given record from over 120 locations around the world.
Of those that were available, average minimum resolution time was 1 second (30x normal).

Protecting Your DNS Assets

You can monitor your own DNS infrastructure and records using standard command line tools like dig. You can also use a product like ThousandEyes to use dig at large scale, across time and with alerts. In addition, check out our CISO’s take on Secure DNS Management Best Practices and an introduction on how you can effectively alert on DNS performance. Over the coming days, we will also post several more times on lessons to be learned from this event.

Processing...
  • John Heyer

    Even with multiple DNS providers, the impact is quite severe. The problem to me seems DNS servers acting as resolvers don’t perform down detection very quickly and/or intelligently, so the only way to recover in this case is by removing the sticken provider entirely, both with the registrar and in the zone’s NS records. Waiting for database synchronization and TTL expiration may take several hours or even days, assuming you have the access. Monitoring is always the first step, but having a mini DR plan for these scenarios is important too.