How to Find and Test Hosting Providers Around the World

Posted by on June 23, 2015

Cloud Agents are one of the most important keys in ThousandEyes. Our customers use them to monitor their network from all around the world. Recently we surpassed 100 Cloud Agent locations, so I thought it would be a good time to explain how we deploy these monitoring points and how you can use similar techniques to test out the performance of data centers around the world.

We are constantly increasing the number of locations for our Cloud Agents, including the latest locations in Jakarta, Indore and Albuquerque. We’re vigorous in our approach to acquiring new locations. We need to be confident that the network performs and has the right peering. We use ThousandEyes to perform these tests, because it gives us good visibility into any network related detail we need to know.

Scouting New Locations

Everything starts by looking for dedicated servers in a location requested by customers or that we think might be interesting based on population or location. As you can imagine, we have to deal with tons of server providers, speaking different languages, and we also have to deal with government rules or restrictions in countries that limit DNS, IP addresses, etc. Once we contact them we explain our server requirements, which normally are easy to fulfill, and then the network requirements, which are not so easy. We want our agents’ connectivity to be consistent, with no packet loss, and stable peerings, so we ensure that by testing.

Testing the Data Center Network

We ask the providers for an IP located in the datacenter where our server will be located; these IPs must have a TCP port open, ideally 80. Testing ICMP alone is not enough to measure latencies; we want a deeper test. You might be thinking of a traceroute, mtr, etc., but no, we won’t ever do that; we will check our Path Visualization and HTTP Server dashboards.

We create a network or HTTP test to the given IP and TCP port using other Cloud Agents, a bunch of them, covering most of the world. We include all the Cloud Agents that belong to the same country as the new one, so we can test local peerings.

We let the test run for several days and then we check the results. We discard or go ahead depending on these checks:

Packet loss: This is the most obvious check, if there is packet loss, we discard the server. There have been cases where we detected packet loss, shared a snapshot with the service provider explaining their issues, and they managed to fix their problems just by checking it.

Figure-1
Figure 1: Example of a discarded server in Montevideo, Uruguay due to packet loss.
Figure-2
Figure 2: This provider had some sporadic packet loss peaks. At the peaks, there is end-to-end packet loss
for most of the locations (yellow dots) and forwarding packet loss inside their nodes (green and red dot).

Peerings and BGP routes: We check the peerings and routes to ensure they are following a normal and logical path. For example, if you send a packet from a location in a country to another location in the same country, and it is routed through an external country, then it fails this test. An example of a bad route of a server located in Barcelona being tested by our Madrid agent can be seen below.

Figure-3
Figure 3: Packets from Madrid to Barcelona, are being routed through United Kingdom and Paris,
this link between two cities in the same country is using Level 3 instead of a local peer.

Backhauling: Some hosting companies are global and they own their own network among their multiple datacenters. That means that they route the traffic within their network instead of through transit and peering networks. This is not a realistic measurement for our customers, so we also discard these providers.

Figure-4
Figure 4: Example of a server in Maidenhead, UK. This provider uses the IOMART network to
reach foreign locations instead of the usual UK peerings.
Figure-5
Figure 5: An example of a promising agent location in Rio de Janeiro, with no packet loss and the routes to Paraiba
and Sao Paulo go through Global Village Telecom, a Brazilian networking ISP.

If a server passes these tests, we move ahead and get it.

Testing the New Server

Besides some performance tests for the server itself, we also do Page Load tests, again using our own product.

This new location will be a real agent, but it will be in testing mode for some days, not shown to our customers. During this time it runs page load tests to multiple popular websites like Google, MSN, LinkedIn, Baidu, etc. We have set an average load time target, so if the new agent performs better than the average, it is good to go and we remove the testing mode so it can be officially used by our customers.

Given the rigorous set of tests, a large proportion of tested service providers don’t pass. For the last 36 locations we added, only 40% of the service providers passed the tests, the other 60% were discarded.

Figure-6
Figure 6: Example of a server in Karaganda, Kazakhstan that, although it passed the previous tests, it was discarded by this one.
You can see the load time for Karaganda, Kazakhstan (strong blue) is higher than the average (light blue)
while testing Google, MSN and LinkedIn.

Ongoing Validation

Once the agents reach production and customers start using them, to be sure that we don’t lose that quality we ensured with the previous tests, we set up an ongoing validation in which all our agents perform network tests among themselves. With these tests, we can detect if there is any issue in any agent like higher latencies, peering changes, higher packet loss, etc.

Looking for hosting providers or co-location space in far-flung markets around the world is a time-consuming process. But having to move providers after the fact, because of poor performance, is much more disruptive. If you’d like to test new or existing hosting environments using the tactics we do, sign up for ThousandEyes Lite and baseline performance from our Cloud Agents to the service provider.

Processing...
  • Gordon Freeman

    Did you find an alternative server for Uruguguay? I can imagine that a lot of poorly connected areas won’t meet the requirements for 0% packetloss.

    • Victor Garcia

      Hello Gordon, the answer is yes, we managed to find a good one in Uruguay with good peerings, latencies and almost no packet loss. As you say, for some locations we won’t require 0% packet loss, but we won’t accept high packet loss either just because is a poorly connected area, we prefer to not get it, rather than have a low quality agent.