Internet Reliability: Can We Do Better than Best Effort?

Posted by on June 9th, 2015
January 21st, 2016

In our second report from ThousandEyes Connect, we dive into Aldrin Isaac’s discussion about how the core of the Internet could be improved so that enterprises wouldn’t have to rely solely on private networks or CDNs to improve performance and reliability. Aldrin Isaac, from the CTO Office at Bloomberg, works on network architecture projects and for nearly 20 years built and operated their global private network.

There Are Private Networks Too

Adrin kicked off his talk with some of his ongoing work. As he describes it, “Bloomberg is in the business of business.” It’s a data news provider for capital markets; anyone who’s working in investment banks and the securities market would recognize the Bloomberg Professional product available on Bloomberg Terminals. This sort of information service requires a very high level of reliability. Aldrin sets up the problem, “our customers don’t want to be down, not even for a couple minutes. And this is hard to do over the Internet.”

A large majority of Bloomberg’s traffic is real-time market data being delivered over a global private MPLS network. This network includes 3 primary data centers, 16 regional hubs and over 100 edge sites that reach over 90 countries. To reach their capital markets customers, Bloomberg leases over 20,000 lines from 300 providers, aggregating these lines at the edge and hub sites, which also provide caching and replication.

Figure 1: Bloomberg’s private MPLS-based WAN.
Figure 1: Bloomberg’s private MPLS-based WAN.

This architecture exists because private networks are relatively reliable. In fact, much of the traffic supplying capital markets today flows over private networks. However, Internet costs are falling much faster than those for private lines. Aldrin presents the tradeoff that this reliability poses, “when Bloomberg goes in with a private line, it directly increases the cost of our service.”

Can Mission Critical Services Run over the Internet?

So Aldrin has been working to provide Internet-based connections that can serve more cost-conscious customers while retaining an acceptable service level. On the public Internet there are concerns such as DDoS attacks, BGP route leaks, DNS hijacks, routing instability, MTU misconfigurations and congestion.

So what are some options beyond best effort over the public Internet? Anycast can be used to draw users to the closest POP. But it’s a bit of a shot in the dark, based on AS Path length, with some ASes longer than others. The trouble with the access model is that BGP has no ability to steer traffic with regards to path quality; remote peering links become a bottleneck.

Content Delivery Networks (CDNs) are also an option; as Aldrin reads the room, “everyone is doing CDN.” However, as he evaluates this option, “for web and consumer traffic, CDNs work well. But for business to business communications, it’s not necessarily as good. And you pay for it.”

He asks, “can we run a reliable capital markets service over the Internet? Are there things we can do from a protocol standpoint to improve Internet performance?” And in order to keep the cost advantages, “rather than over the top tricks, how do you improve the core of the Internet itself?”

Four Internet Performance Ideas to Explore

Aldrin discussed four ideas with the audience: IXPs for business Internet, smart DNS, MPTCP and multi-site multi-pathing.

IXPs for Business Internet. Aldrin explains that “if a customer wants Reuters, Bloomberg or Dow Jones, they get private lines for each of these. However, if we all met at an IXP, we could in the long run achieve much better business-to-business connectivity.” For B2B service providers like Bloomberg, “the peering model advantage is that we create more direct access to our business partners and customer base at a better price point than buying access lines directly with numerous ISPs.”

Figure 2: The trouble with the access model, peering with each business partner or customer.
Figure 2: The trouble with the access model, peering with each business partner or customer.

Smart DNS. Rather than Anycast, which routes via the shortest AS Path, how can you route customers to the highest performance location? Aldrin suggests a second idea, “performance-aware, distributed DNS.” He explains that Facebook does this by rerouting user traffic in controlled experiments to understand how alternate paths and endpoints perform, and use this data to refine their DNS.

Figure 3: Smart DNS that routes traffic to the nearest, best-performing edge.
Figure 3: Smart DNS that routes traffic to the nearest, best-performing edge.

Multi-Path TCP (MPTCP). MPTCP is a protocol that takes advantage of ECMP opportunities on the Internet. It can break a communication up into subflows, inverse multiplexing, and hash them across multiple paths. MPTCP can detect impairment on one path and shift to another. Aldrin also notes that MPTCP “doesn’t have to be end to end, but can also be gateway to gateway.” You don’t have to rely on client or server support for MPTCP in this B2B scenario.

Figure 4: MPTCP using multiple paths rather than a single path defined by the 5-tuple.
Figure 4: MPTCP using multiple paths rather than a single path defined by the 5-tuple.

Multi-Site Multi-Pathing. Aldrin then talked about a project he’s working on to improve the performance of the Financial Information Exchange (FIX) protocol. Counterintuitively for a financial data protocol, it doesn’t have encryption built-in and requires high availability, but does not provide it natively. In the multi-site, multi-pathing model, a “customer connects to a service via multiple edge sites. Protection against failures is provided between the origin and edge or at the edge.” With multiple paths, a failure or degradation on one path causes minor disruption that wouldn’t affect a latency- and loss-sensitive protocol like FIX.

Figure 5: Multi-site Multi-Pathing with application-level gateways at multiple edges.
Figure 5: Multi-site Multi-Pathing with application-level gateways at multiple edges.

More on ThousandEyes Connect

For more ThousandEyes Connect presentations, check out Steve Lerner’s talk on eBay’s approach to CDN and web performance; plus, there’s more to come over the next week.

Processing...