During my career I’ve seen all kinds of network performance problems: packet loss, latency, TCP issues, you name it. There are a lot of tools that can be used to track down these problems when they impact traditional enterprise applications. However, it’s always been very difficult to find the root cause of issues like loss and latency in Internet facing applications. I’ve often felt like I was trying to find a needle in the world’s biggest haystack.
Customers are having trouble accessing my site!
Availability is always a top concern for people who manage Internet facing websites and services. It can impact your customer base, your users, your partners, your vendors and the list goes on. There are a lot of tools out there that can tell you that your site is fully or partially down, but very few of them will help you figure out why. This is especially the case when you have only a partial or intermittent outage; as it becomes much more difficult to diagnose the issue. Figure 1 shows a screen that no site owner wants to see, with availability hovering around the 50% mark.
We can see right away that the site isn’t fully down, but it’s also clearly not in an optimal state. As shown in Figure 2, we can leverage the end-to-end metrics in the ThousandEyes platform to see that there’s been a substantial increase in packet loss that matches up with the drop in availability we observe:
However, there’s something else interesting going on in Figure 2; three of our test locations are bright green indicating there’s no packet loss. We can see those locations listed out in figure 3.
So what’s going on here? Why are five locations consistently showing loss and having availability issues while the other three are fine? ThousandEyes path visualization technology is uniquely suited to answer this question for us. In Figure 4 we can see that the five locations having issues connect to site we are monitoring via a different path and provider than the three who are working fine.
Not only are we able to quickly identify that the availability and packet loss issues are isolated to the Road Runner network, we can even see specific nodes where loss is occurring. We can also verify that locations connecting via AT&T are doing just fine. With ThousandEyes, you could even generate an interactive share with all this information and send it to your provider with a couple of mouse clicks as shown in Figure 5.
Some users are complaining that our site is slow!
Slow websites aren’t much more fun than sites that you can’t get to at all; no one likes to watch the loading icon spin around and around. When the response time issue is intermittent or it doesn’t impact every user, the problem just becomes that much harder to solve. Figure 6 shows an employee/partner portal for a large manufacturer experiencing a large increase in response time for several hours.
Taking a look at the ThousandEyes end-to-end metrics, we can again see that packet loss is a major symptom of our problem. However, as shown in Figure 7, once again only some locations are impacted.
In this case the site owner said that they only use one provider; can path visualization save the day again? Absolutely, even though all the locations access this site via AT&T we can see in Figure 8 that the loss is occurring only in one portion of AT&T’s network and can even identify specific nodes that are dropping packets.
Making the black box transparent
The Internet has often been referred to as a “black box” when it comes to troubleshooting. Very few tools provide any level of useful visibility into what happens inside the Internet and historically troubleshooting has involved a lot of finger pointing and everyone having a theory involving the problem not being their fault. ThousandEyes is revolutionizing how we troubleshoot these issues by providing transparency to the Internet and a common view all parties can use together when trying to find the root cause of problems. I encourage you to try out ThousandEyes today and see how path visualization can decipher the Internet for you.