Proactive BGP Alerting

Posted by on November 4, 2014

In several previous posts, 4 Real BGP Troubleshooting Scenarios and Solving BGP AS Path Prepending Errors, we’ve highlighted the ThousandEyes platform’s ability to visualize and analyze BGP issues and impacts. Today we’d like to discuss how BGP alerts within the platform can proactively inform you of these critical issues and significantly reduce MTTR.

Using BGP Alert Rules, we’ll illustrate how ThousandEyes can proactively alert on the following events:

  • Prefix Hijacking
  • Peering Changes
  • Route Flapping
  • DDoS Mitigation
  • AS Path Prepending Errors

Basics of Alerts Setup

First let’s cover the basics of alerting in the platform. ThousandEyes is built upon X-Layer and its ability to correlate different layers in the TCP/IP stack and their data sets. When a test is created to monitor a target URL or IP address of interest, multiple views are generated from the collected data sets based on the test type. Pre-defined Alert Rules are then attached to the test corresponding to the views or layers enabled for the test.

When an HTTP Server test is used to analyze the HTTP Response time (DNS, Connect Time, SSL and Wait Time) of a URL, multiple network views including Path Visualization and BGP Route Visualization are also enabled by default (Figure 1). Alert Rules for Network, HTTP and BGP can then be assigned to the test matching those views. This modular approach allows for tiered alerting system and tackles complex network and application issues elegantly.

Figure 1: Pre-defined alert rules are attached to tests corresponding to enabled views.
Figure 1: Pre-defined alert rules are attached to tests corresponding to enabled views.

BGP Alert Rules

Since BGP Route Visualization utilizes public and private monitors to visualize the AS (Autonomous System) paths toward the prefixes of interest, BGP Alert Rules within the platform are triggered based on the minimum number of those monitors in the defined violation state using a variety of metrics. Let’s use the Default BGP Alert Rule in Figure 2 as an example. Tests or prefixes assigned with this default Alert Rule will trigger under conditions where five or more monitors have less than 100% reachability during an interval or five or more monitors observe at least one AS path change upstream.

Figure 2: ThousandEyes triggers BGP alerts based on thresholds across BGP monitors.
Figure 2: ThousandEyes triggers BGP alerts based on thresholds across BGP monitors.

BGP Alerts for Prefix Hijacking

Let’s walk through how to create BGP Alert Rules and apply combinations of BGP metrics for proactive alerting on these critical BGP issues.

In my colleague’s post 4 Real BGP Troubleshooting Scenarios, we discussed a rather serious incident where Indosat (AS 4761) incorrectly advertised a large number of prefixes in the global routing table that did not belong to them and effectively hijacked routes that Akamai (AS 16625) and many others own. In this event, a BGP Alert Rule with the Origin ASN metric set to Akamai’s ASN would have proactively identified the problem, without requiring responsible parties to analyze and troubleshoot the issue through public looking glasses (Figure 3).

Figure 3: A BGP Alert Rule using Origin ASN for prefix hijacking notifications.
Figure 3: A BGP Alert Rule using Origin ASN for prefix hijacking notifications.

Origin ASN defines the AS number associated with the monitored prefix and therefore refers to the organization that owns the prefix (e.g., AS 16625 refers to Akamai). This Alert Rule would have immediately triggered an alert when the monitors observed a change in Origin ASN for the prefix in question. Additional ASN’s may be added if your organization owns several AS numbers.

BGP Alerts for Peering Changes and Route Flapping

Peering changes and route flaps affect reachability to the prefix of interest. Either issue generally involves changes in AS path from ThousandEyes public BGP Monitors. For BGP configurations where you have more than one provider, and your secondary ISP provider acts only as a failover for the announced prefixes, you may use the Next-Hop ASN metric to alert on situations where inbound traffic goes through your secondary provider.

Next-Hop ASN refers to the AS number before the destination AS for the monitored prefix. It is generally the ASN of the ISP’s that peer directly with the destination AS.

This simple alert rule in Figure 4 will notify you of path changes observed downstream for the prefixes you want to monitor and keep an eyeball on the transit providers. The higher the number of minimum monitors you’re using for the Alert Rule, the fewer false positives you will observe and the more likely that an active alert will indicate a problem closer to the infrastructure where the prefix is announced.

Figure 4: BGP Alert Rule utilizing Path Changes to detect upstream peering and route flaps.
Figure 4: BGP Alert Rule utilizing Path Changes to detect upstream peering and route flaps.

BGP Alert Rule for DDoS Mitigation Activation

For companies that use DDoS mitigation providers such as Prolexic or Verisign, there are multiple techniques to shift traffic toward the provider during an attack. For methods involving BGP, this either involves the DDoS mitigation provider announcing the customer’s prefix or a more specific prefix on behalf of the customer (therefore changing the Origin ASN), or the provider may form a BGP peer with the customer and act as the Next-Hop ASN for inbound traffic (therefore creating asymmetric routing). The Origin ASN, Next-Hop ASN, and Prefix/Subprefix metrics can be used to notify you of when a DDoS mitigation service kicks in.

For an example, let’s say you are using Verisign as your DDoS mitigation provider (AS 26415 and 30060 belongs to them). You normally announce prefix 199.168.1.0/23 while Verisign uses a more specific 199.168.1.0/24 during the attack to redirect the traffic toward their scrubbing centers. You can attach the following Alert Rule to the test to inform you that the handoff to the DDoS mitigation service is advertised correctly (Figure 5).

Figure 5: Figure 5: BGP Alert rule utilizing Origin ASN and Prefix metrics for DDoS mitigation.
Figure 5: Figure 5: BGP Alert rule utilizing Origin ASN and Prefix metrics for DDoS mitigation.

BGP Alert Rule for Detection of AS Prepending Errors

In a post last month, Doug went through an example of how ThousandEyes can visualize AS Path Prepending Errors. In this case, a bank mistakenly typed 15011 instead of 10511 in the AS paths that they were prepending to the route advertisements. Since ThousandEyes condenses consecutive repeating ASN’s in the AS path, we can use the Next-Hop ASN to detect the error. With the ISP’s being AS209 and AS40948, the alert rule in Figure 6 could have been used to detect the error observed.

Figure 6: Next-Hop ASN metric for AS-prepending error detection.
Figure 6: Next-Hop ASN metric for AS-prepending error detection.

We’ve covered four BGP monitoring use cases and issues that you can detect using custom Alert Rules in ThousandEyes:

  • Prefix hijacking = Origin ASN
  • Peering changes and route flaps = Reachability, Path Changes
  • DDos mitigation = Origin ASN, Prefix
  • AS path prepending = Next-Hop ASN

Make sure to customize and apply Alert Rules in your tests to get the most use out of your ThousandEyes account.

Processing...
  • Alex NG

    Ken, great post. I just made changes to my BGP alert rules. Thanks AlexNg