Enterprise Agent Deployment and China Performance at Zendesk

Posted by on March 24th, 2016
March 24th, 2016

In this post from ThousandEyes Connect San Francisco, we’ll discuss the presentation by Steve Loyd, Vice President in Engineering Operations at Zendesk. Steve has worked in operations at a number of different companies, including Yahoo, Intraware and Sun Microsystems. He’s been at SaaS provider Zendesk since 2012, where the networking team has been using ThousandEyes for more than three years.

Figure-1
Figure 1: Steve Loyd presenting at ThousandEyes Connect San Francisco.

During his talk, Steve describes two initiatives his team has undertaken: deploying Enterprise Agents in Zendesk’s data centers to gain network visibility, and exploring strategies to improve network performance in China.

Deploying Enterprise Agents at Zendesk

Based in San Francisco, Zendesk provides a cloud-based customer service platform that includes ticketing, self-service and customer support features. 60,000 companies use Zendesk to develop better relationships with their partners and customers. Kicking off the talk, Steve discusses how his team gets visibility into the Zendesk network. In addition to the ThousandEyes Cloud Agents that make up a complete global network of monitoring points, Enterprise Agents are also very valuable in providing a unique perspective on your network.

Steve’s team has deployed 13 Enterprise Agents in Zendesk’s data centers—two US data centers and two European data centers, each with full stacks of the Zendesk application set. Zendesk has also deployed Enterprise Agents in Amazon Web Services (AWS) in Tokyo. As Steve says, “What that’s allowed us to do is to have full mesh intelligence around how things are going in between those data centers, including across non-public links.” Zendesk has virtual private LAN service (VPLS) implementation between all four of their data centers, and with ThousandEyes they’ve been able to monitor across that mesh as well.

Steve gives an example: “If you have a partner that you’re dependent on, it’s useful to be able to get information from the ThousandEyes Frankfurt Cloud Agent showing packet loss going to a particular provider that’s important to you. But it’s a different, more important perspective to have a view from within our Frankfurt data center on the providers that we use in that data center. The Enterprise Agents have been important in that way for us.”

Improving Performance with Service Providers

Steve goes on to talk about a few examples where having ThousandEyes was key to solving problems. ThousandEyes not only helped the Zendesk team detect issues that they also would have seen with other monitoring tools—like an interface flapping on a router—often with clearer visualization of the issues. In addition, in the case of the VPLS network, ThousandEyes helped the team troubleshoot a problem they didn’t know they had. The Zendesk team saw “quite a bit of packet loss occurring at different times during the day over a long period of time, which got lost in other tools that were just showing state but weren’t getting into the details of packet loss.” After deploying Enterprise Agents, the team made a carrier change in their VPLS setup, which solved the problem.

ThousandEyes has also helped address upstream provider problems. Zendesk has had issues with one of their cloud storage providers: “When we have accessibility issues storing objects there, it can cause a backing effect onto our application stack and some relatively serious problems for attachments and large objects within our service.” Having Enterprise Agents in Zendesk’s data centers testing to critical third-party providers allowed the team to get visibility into issues in the intermediate network between Zendesk and their cloud provider.

Getting Familiar with Customers’ Networks

As a final example, ThousandEyes has also helped improve the performance of the Zendesk voice product, which is very sensitive to metrics like latency, jitter and packet loss. The voice product has seen rapid adoption, but their customers vary widely in their experience with network troubleshooting: “We have really sophisticated customers in their use and knowledge of what is needed in an office for voice, and others that aren’t so sophisticated. They might have a global call center where they pack as many people as they can into a room to do support, but they haven’t thought about how robust the network needs to be to support a voice application.” In a couple of cases, the Zendesk team deployed an Enterprise Agent onto a customer site to acquire network intelligence over a couple of weeks to a month. According to Steve, this “moved the conversation with the customer to another level and helped the team solve the problems they had locally. Otherwise we wouldn’t have been able to get that visibility without sending a systems engineer (SE) out on site, which is a very costly thing to do.”

Improving Performance in China

Over the past few months, one of Zendesk’s project teams has been working on improving global performance in China. Founded in Denmark, Zendesk was a global company from the start, now with nine global engineering offices and more revenue coming from outside the US than inside. The majority of the 51 members of the engineering operations team are located outside the US, and the team isn’t new to dealing with issues with performance and business constructs in another country. In the past couple of years, issues related to performance in China have come up quite often.

Steve’s team has focused primarily on enabling Zendesk customers that have remote staff or end users in China. While China represents a huge opportunity with one billion people (and growing) regularly going online, there are also some huge obstacles, including the Great Firewall. As Steve summarizes, “Because of the combination of the way that the overall network has grown and been implemented in China, there are some very natural choke points for ingress and egress within China that can cause pure performance problems. Layer on top of that the government’s political interests and implementation of various filtering, blackholing, DNS control mechanisms and you have a really spotty experience, or an experience that doesn’t work at all.”

The Reality on the Ground

Steve’s team has used the ThousandEyes Cloud Agents in China to get a view on how Zendesk’s services are performing there.

Figure-2
Figure 2: DNS availability over time from a China agent to Dyn.

Figure 2 shows DNS availability over time from a China agent to one of Zendesk’s DNS providers, Dyn, and the red along the bottom indicates unavailability when DNS queries failed.

Figure-3
Figure 3: Packet loss over time from a China agent to Zendesk’s service. Packet loss averaged over 50%.

Figure 3 shows packet loss over time from a China agent to Zendesk’s service; packet loss averaged over 50%. Steve quipped, “It’s not true that just some packets are okay, you kinda want to have most of them.”

Steve noted that Zendesk fortunately got help with these issues from the ThousandEyes team, who had experience with customers with similar problems and proved to be a good resource. We also recently published blog posts Deconstructing the Great Firewall of China and Internet Censorship Around the World, which Steve recommended to get a better understanding of networking in China.

Options to Better Serve Chinese Users

The Zendesk team is still working to figure out how to further improve service in China, from DNS issues to reachability outside China. From a technical standpoint, the easy answer to resolving performance issues is to build an entire stack in China, but this is far from an easy answer from a business standpoint. Though Zendesk isn’t considering building their own platform in China, Steve and his team has discovered a few possible solutions for improving performance that aren’t quite as involved.

Some companies find a partner company in China to build out and operate their entire stack for them. Zendesk is exploring options including using vendors like ChinaCache, a CDN provider, and using a local DNS proxy service.

Apart from partnerships, there are also some workarounds that may see success—for example, the Zendesk team has worked with their CDN provider to change the names of CDN endpoints. “Whatever your CDN provider is, your CDN CNAME records don’t have to include their name, and this may allow you to fly under the radar of some of the filtering mechanisms. That’s a solution that can work today but it might not work tomorrow, and you’ve got to have monitoring in place to make sure that assets for your application can all be reached within China.”

Other companies have taken the approach of using a “giant VPN” service with endpoints posted in Hong Kong. From China, you can generally get pretty good availability and access to resources in Hong Kong, and many companies have essentially remotely extended their endpoints out into Hong Kong with a big backhaul network that goes somewhere else, perhaps to the US or Europe.

Zendesk has also seen some success with efforts related to email provider routing. The team has worked toward getting better support relationships with some of the biggest email providers, including corporate hosted providers like Tencent QQ and also public providers similar to Gmail. Hiring team members in Hong Kong facilitated the relationship building, and Steve notes, “It’s amazing how much more you can get done when you hire some engineers in Hong Kong, and again that comes back to how important relationships between companies are.”

For more details from Steve’s talk, check out the video of the full presentation below.

Stay tuned for more posts on the other ThousandEyes Connect SF tech talks from Cisco and RichRelevance.

Processing...