Determining Where to Build a New Data Center at Shutterstock

Posted by on November 9, 2015

In this update from ThousandEyes Connect New York, we’ll go into the presentation by Gene Yaacobi, Infrastructure Manager at Shutterstock. Gene leads the traffic team, which oversees network operations and management, and the assets team, which handles data center operations.

Fig-1
Figure 1: Gene Yaacobi presenting at ThousandEyes Connect New York.

Gene kicked off the morning with his talk on how his team used ThousandEyes data to determine where to build a new data center at Shutterstock.

A Herculean Task: Delivering High-Quality Experiences to Customers

Many people have heard of Shutterstock—the stock photography, stock footage and stock music provider—but few realize just how gigantic their business is. With assets including 60 million images and counting (with over 50,000 images added every day), Shutterstock has had 500 million paid downloads to date and now sells four images per second. They have over 1.3 million active customers in more than 150 countries, with 70% of traffic coming from outside the U.S. As Gene summarizes it, “Our focus is on volume and to give folks one of the best search experiences when it comes to finding vectors, photography and audio. We have a lot of people that really rely on us, and we want to make sure that we’re delivering the best experience to our customers.”

What about Shutterstock’s infrastructure? “Everything that’s built and sold is sitting on the infrastructure that we build, maintain and scale.” Shutterstock has three data centers and over 4,000 servers running CentOS, most of which are virtualized. They also have roughly 500 network devices and use Brocade load balancers, though they are in the process of moving to Equal Cost Multi Pathing and Anycast, while the rest of the network is Juniper-based.

As the “first responders,” the Infrastructure team has engineers on call on a rotating basis to handle emergencies. The team is made up of five smaller groups:

  1. SRE (Site Reliability Engineering): Architects and designs new services to deploy and handles internal education and training.
  2. Tools: Builds out virtualization platform and automation tools.
  3. Storage: Ensures storage architecture is always available, resilient, fast and scalable.
  4. Assets: Handles data center operations and hardware procurement.
  5. Traffic: Similar to a traditional network team that handles routing, DNS and network monitoring.

Where Do We Build Our Data Center?

One common issue that network engineering teams encounter is deciding where to deploy a new site. Gene and his team developed a methodology to answer this question using data from ThousandEyes. Why was this decision so important for Shutterstock? Gene explains, “Every second counts. 40% of people abandon a website that takes more than 3 seconds to load. To put that into perspective, if an e-commerce site is making $100,000 per day, a 1 second page delay could potentially cost you $2.5 million in lost sales every year.”

Gene outlines four steps to using data to answer the important question of choosing where to build a new data center.

1. Decide which regions are important to you

To determine the locations most important to the business, Gene suggests using Google Analytics to see where the majority of traffic is coming from. You can then deploy ThousandEyes agents and tests in those regions.

2. Create tests

At Shutterstock, Gene sets up ThousandEyes tests to check pages every 15 minutes and HTTP servers every 5 minutes. Bandwidth measurements, which are available through Enterprise Agents, are very helpful for Shutterstock because throughput is an important metric for media downloads.

3. Let simmer

Wait for your tests to collect data.

4. Review and analyze

Time to review and analyze the data. Start by looking at page load times—ThousandEyes makes this simple by allowing users to build reports. Gene recommends looking at the breakdown of response times: “You can actually see how much time your client has spent looking up DNS, trying to connect to the server, trying to negotiate SSL and ultimately waiting for the application to respond. Sometimes the cause of performance issues is not the network (surprise!)—it could be a bad application, and something like ThousandEyes might be able to help you pinpoint that.”

Fig-2
Figure 2: Looking at the breakup of response times helps pinpoint exactly where the issues are occurring.

When reviewing reports, Gene recommends, “Start with whatever your worst performer is and then drill down into your data.”

Fig-3
Figure 3: Start with the longest page load times and drill down into the data from there.

Gene also likes to look at page load waterfalls, which show every asset that a given page is trying to pull down. For each asset, you can see how much time a client has spent trying to fetch it and what state it was in at a given point in time. Gene explains that waterfalls are “really helpful to figure out where exactly a problem might exist. A lot of times you’ll have a page that’s pulling assets from a source that’s not you, and you don’t always have control over that. Ultimately, in a lot of cases, being closer to the end user will drive a lot of these numbers down.”

Fig-4
Figure 4: Waterfalls show how much time a client has spent trying to fetch each asset.

What Else Can the Data Be Used For?

ThousandEyes tests and agents can also be used for many other purposes, including ad-hoc troubleshooting and monitoring overall site performance. ThousandEyes has been “really helpful to figure out if issues are global, regional or locked down to a specific user.”

Feel free to check out the video of the full talk below, and many thanks to Gene for sharing valuable insights into overseeing Shutterstock’s massive network infrastructure.

Processing...