Optimizing CDN and Web Performance at eBay

Posted by on June 3rd, 2015
January 21st, 2016

We recently held our first ThousandEyes Connect event, with network engineers and architects from around the Bay Area. We heard from four ThousandEyes customers over the course of the morning, on topics ranging from Internet reliability to deploying Enterprise Agents at scale. We’ll share key takeaways from each one of these presentations over the next few weeks. First up, Steve Lerner’s talk on Optimizing CDN and Web Performance at eBay.

Optimizing Performance at eBay

Steve Lerner, Senior Member of Technical Staff at eBay, kicked off the morning with how his team thinks about website performance. Steve is responsible for Content Delivery Networks (CDN) at eBay Marketplace, the e-commerce and auction platform matching more than 24M sellers with 157M buyers. It’s a big task “that means engineering, operations, vendor management strategy.” In addition to the team managing CDNs, Steve works alongside Speed Teams that include members from “platform, measurement, media services that optimize images and video, and mobile which owns the entire mobile stack.”

Figure-1
Figure 1: Steve Lerner presenting about eBay’s infrastructure at ThousandEyes Connect.

Steve describes the importance of network performance to eBay as more than just quantitative measurements. “It’s not just about packet loss and congestion or that a whale ate the transatlantic fiber cable. It’s about speed and the things that are impacting speed itself. We are a retail company and degradation of speed means a degradation of dollars.”

Using CDNs to Get eBay’s Products to Market

As Steve describes it, “everything is big and massive at eBay.” eBay operates a massive multi-tenant backbone with several data centers, built from the ground up, and additional colocation facilities. They are also huge users of CDNs, helping to serve up the 10s of billions of hits per day to their edge.

CDNs provide eBay with a number of benefits beyond caching, including edge TCP and SSL termination. Steve describes how eBay “caches everything that is static—javascript, CSS. We tune CDN caching based on the type of object it is. Is it a search result? An item view picture? We tune the cache hierarchy to give different treatment in terms of bandwidth and performance for each of these.”

eBay also does what Steve calls, “dynamic acceleration—we proxy dynamic and personalized experiences through CDNs as well, to get the benefit of edge TCP connectivity and better routes.”

When asked whether eBay was building their own CDN, Steve made the point that “our peers typically don’t build their own edge CDN, but some will build their own origin CDN. There are diminishing returns to focusing equipment further and further out into thousands of networks. The reason you build an origin CDN is because you want to reduce the penalty of a cache miss, add redundancy, save costs and improve performance for certain types of traffic.”

Monitoring eBay’s Key Network Components with ThousandEyes

Steve continued the talk with a look into how his team uses ThousandEyes to monitor key infrastructure components including DNS, load balancers, routing and CDNs. “I like to call ThousandEyes a NOC in a box, a virtualized NOC of things you used to be doing manually.”

His first example is DNS monitoring. At eBay they use Dyn global DNS. From several years ago, he shows an event of “turning on global DNS for an $80B per year e-commerce site. It’s not often you get to do something like that.” Looking at the performance data, he describes how key pages eBay went from one DNS cluster to dozens around the world. As they ramped during at 24 hour period, adjusting up the queries and adding more authority to Dyn, DNS resolution time had a 50% improvement.

Figure-2
Figure 2: The performance impact of using a global DNS service.

In another example, Steve pulls up metrics for the page load time of an eBay page in Europe. In this event, Steve’s team enables CDN proxying, providing edge SSL and edge TCP connectivity. Page load time improves by 2-3X as they make the change. Playing devil’s advocate, Steve poses the question: “so why not use Real User Monitoring (RUM)? Why use synthetic at all?” His perspective: RUM, even with the new Resource Timing, does not include every factor associated with that page. It doesn’t tell the full story. Synthetic monitoring includes it all, presenting a “neutral palette” of indisputable data.

Three Tips for Web Performance

Steve wrapped up with a number of tips for the network engineers in the audience.

First, triangulate. According to Steve, “don’t trust any single source. It’s important to triangulate between synthetic testing, RUM and logs. To understand an event, you need to look at all different angles.”

Second, be aware. It’s important to be aware of how your payload changes, such as objects on a web page. Use this insight to help put network performance in the context of the applications that you care about.

Third, build in measurement hooks so you can gather the data you need. This is especially important for mobile apps, as they tend to be built by entirely different teams.

If you’d like to see more of Steve’s presentations about web performance, check out his talk from Velocity New York 2014. And stay tuned for more great content from ThousandEyes Connect.

Processing...