Measuring the Impact of BGP Changes

Posted by on September 10th, 2014
March 18th, 2015

Many people think of BGP as something that only your most senior network engineers need to worry about, but the truth is BGP changes can have a direct impact on your customers’ user experience and even their ability to access your site.

What happens when I change my provider?

Who you peer with determines exactly how your customers reach you over the Internet and how long it takes them to reach you. Changing providers is like deciding to take an entirely different set of roads when you go back and forth to work. Let’s take a look at a major financial website that made a BGP change.

The financial website normally advertises one prefix through Qwest and the other through Verizon. They decide to make a change to advertise both their prefixes via Verizon (Figure 1).

Figure 1: BGP change from Qwest (AS 209) to Verizon (AS 701).
Figure 1: BGP change from Qwest (AS 209) to Verizon (AS 701).

BGP monitoring allows us to directly correlate changes to Internet routing with the performance of the application being tested. So did we see any change in response time? In fact, after the BGP change the financial site saw a very large increase in response time for several hours (Figure 2).

Figure 2: Large increase in response time after a BGP change rerouted all traffic via Verizon.
Figure 2: Large increase in response time after a BGP change rerouted all traffic via Verizon.

In addition to BGP monitoring, the ThousandEyes platform correlates changes in network performance with BGP changes. A large increase in packet loss directly lines up with the observed increase in response time (Figure 3).

Figure 3: High levels of packet loss match up with the increase in response time.
Figure 3: High levels of packet loss match up with the increase in response time.

Not only can we see that packet loss is occurring, we can leverage path visualization to help us determine where it is occurring. From the path visualization it is clear that a specific customer gateway node in the Verizon network is experiencing loss. We can also see that only locations that access the website using the prefix involved with the BGP changes are the ones having issues (Figure 4).

Figure 4: Packet loss occurring at a specific Verizon node.
Figure 4: Packet loss occurring at a specific Verizon node.

Now that we know what is behind this issue, let’s see if we can figure out why response time went back to normal after a few hours. It turns out the answer is quite simple; another set of BGP changes occurred that set things back the way they were in the first place with one prefix advertised via Qwest and one advertised via Verizon (Figure 5).

Figure 5: BGP change back to Qwest (AS 209) from Verizon (AS 701).
Figure 5: BGP change back to Qwest (AS 209) from Verizon (AS 701).

Not Every Impact is Obvious

In the first example we saw a BGP change that had a dramatic impact on end user experience and resulted in significant network performance issues. However, BGP changes often impact websites and end users in important ways that aren’t quite so obvious. Let’s look at an example of a major retailer making a BGP change to route all their traffic via Comcast; they previously routed most of their traffic via AT&T and a smaller portion via Comcast (Figure 6).

Figure 6: BGP change, withdrawing advertisements through AT&T (7018) in favor of routing all traffic via Comcast (AS 33668).
Figure 6: BGP change, withdrawing advertisements through AT&T (7018) in favor of routing all traffic via Comcast (AS 33668).

Path visualization isn’t just used to solve problems, we can also use it for several other purposes including verifying that BGP changes did what we thought they would. We can see both how traffic transited prior to the change (Figure 7) and compare that to after the change (Figure 8).

Figure 7: Prior to the change 6 locations connect via AT&T and 2 locations connect via Comcast.
Figure 7: Prior to the change 6 locations connect via AT&T and 2 locations connect via Comcast.
Figure 8: After the change all 8 locations connect via Comcast.
Figure 8: After the change all 8 locations connect via Comcast.

Now that we’ve verified that the changes did indeed result in all traffic being routed via Comcast, let’s see if this had any impact on network performance. Since BGP changes impact the path a customer or user takes to access your site, they can also impact network latency. In this retailer case, we don’t see a change in the overall average latency. However, we do see significant changes for some locations like Dallas, TX which had been routed via AT&T before 23:00 and via Comcast afterwards (Figure 9).

Figure 9: The Dallas, TX location experiences a 30% increase in latency due to being routed via Comcast instead of AT&T.
Figure 9: The Dallas, TX location experiences a 30% increase in latency due to being routed via Comcast instead of AT&T.

We can now see if this increase in latency has any impact on application performance. Our Dallas location also sees a significant increase in response time, even though the overall average response time doesn’t appreciably change (Figure 10).

Figure 10: The Dallas, TX location also experiences a 30% increase in response time due to the higher latency after the change.
Figure 10: The Dallas, TX location also experiences a 30% increase in response time due to the higher latency after the change.

Relating BGP Changes to Service Delivery

Internet routing and BGP are definitely interesting and complex topics, but what most IT departments primarily care about is the effect they have on service delivery. As we can see across two examples, BGP changes can have large repercussions when it comes to service delivery that sometimes set off alarms and sometimes can be quite subtle. If you’d like to see how Internet routing and BGP impact the services you deliver, I invite you to try out ThousandEyes.

Processing...