Monitoring Office 365

Posted by on April 4th, 2018
April 16th, 2018

Is my network ready for Microsoft Office 365? How do I monitor network performance and ensure a flawless user experience across my entire user base, regardless of whether they are on my network or out on the Internet? These are the questions plaguing many adopters of the Office 365 suite, which includes Exchange Online and Sharepoint Online.

Monitoring SaaS applications presents a challenge because you depend on networks and data centers you do not own. All of the traditional network monitoring tools and techniques, like SNMP and flow, flatline outside your network boundaries. You cannot poll Microsoft’s routers, get flow records from their data center, or install your own monitoring agents on Exchange Online servers. So how do you manage the outcome when you don’t own the underlying infrastructure and networks?

Network operators have traditionally regarded the Internet and external services as a black box, because they have had zero visibility outside their enterprise boundaries. This is not a sustainable approach, as more and more critical IT apps move outside your data centers. With Active Monitoring techniques and the combination of application layer data with network layer data, ThousandEyes can provide you the Network Intelligence you need to not only survive in the world of SaaS but thrive. With a few simple best practice recommendations, you can adequately prepare for your Office 365 rollout, establish performance baselines, navigate through outage resolution and ensure a great end-user experience.

Looking Under the Hood

Microsoft Office 365 is a bouquet of applications aggregated under a common portal, with a common look and feel. However, they are all distinct applications that share some common elements but may be delivered very differently.

Microsoft Office 365 Architecture
Figure 1: Microsoft Office 365 Architecture.

Microsoft has a global network of data centers and an IP backbone network with peering connections with almost every ISP on the planet. The Office 365 suite of apps is optimized to be delivered over the Internet, and Microsoft aims to get as close to the users as possible. If you’re an enterprise, one of the first decisions you will have to make about Office 365 is whether to backhaul all your Office 365 traffic through your WAN, through your perimeter security stack and then out to the Internet, or to split-tunnel it directly from the branches out to the nearest Microsoft point of presence.

Microsoft uses a two-tier CDN architecture to deliver most of their applications. Users sessions typically terminate on an edge data center, also known as the Service Front Door, as determined by their Anycast DNS load balancing “brain”. The goal here is to minimize network latency between the user and the Microsoft edge, so that time consuming steps like the SSL handshake happen in a reasonable time window. From here, the edge server proxies the connection out to yet another data center where the actual application lives. For Exchange Online, the mailbox server location is typically dynamic, subject to data boundaries, whereas for Sharepoint Online, there is always one permanent data center region where all the client data lives.

This leads to some interesting differences in behaviour. For example, in the ThousandEyes Service Health Dashboard shown in Figure 2, we can see the difference in application response times between Exchange Online and Sharepoint Online. Exchange Online response times are more evenly distributed across the globe, whereas SharePoint Online gets worse the farther we move away from the primary data center in Ashburn, VA. Our recent Monitoring Office 365 Best Practices webinar dives deeper into this — check out the video for a demo.

ThousandEyes dashboard with Office 365 tests
Figure 2: Office 365 Monitoring Dashboard.

What Do I Need To Monitor?

Office 365 apps have their own distinct URLs and delivery architectures. Hence it is important to monitor each application separately. Some can be logically grouped; for example, the Live apps are all hosted on a common platform and share a hostname. However, Sharepoint Online, Exchange Online, OneNote and Azure Active Directory (AD) are all distinct URLs.

Also, don’t forget the impact of Azure AD on the overall experience. Every test iteration within ThousandEyes starts from a clean state with no previously cached data. When the application receives this request, there is no auth token sent along with it. Hence the application redirects this request to Azure AD (or any other configured Identity Provider). By default ThousandEyes HTTP Server tests are configured to follow redirects, but this can be easily disabled. So if you’re primarily interested in understanding network performance and application performance, you can use a simple HTTP Server test. However, if you would like to measure the entire transaction, you can setup Transaction Tests that perform the login using a service account and access one or more applications within the Office 365 suite.

Recommended tests for Office 365
Figure 3: Office 365 recommended tests.

How Do You Benchmark Office 365 Performance?

There’s no such thing as steady state in the cloud. So you cannot really define a static benchmark against which to measure performance. Instead, you want to compare yourself against your peers, or an average user in your city.

You can accomplish this by placing ThousandEyes Enterprise Agents inside your key office locations, and comparing performance against that of a Cloud Agent in the same city. Figure 4 captures this comparison between an Enterprise Agent (SF-Office-1) located in our San Francisco headquarters to a Cloud Agent in San Francisco.

Benchmark graphs for Exchange Online
Figure 4: Benchmarking Exchange Online performance.

You want to see these two trendlines more or less correlate. If you’re lucky you may see better performance than your peers in the same city, but if you start to see these trendlines diverge, and your performance starts to get worse, you need to understand the contributing factors.

Slow response times can be caused by a number of factors ranging from DNS resolution problems, to network delays to application wait times. It’s important to understand this because they’re all potentially managed by different groups and vendors. A Microsoft support ticket won’t help you much if the problem is a network delay caused by your ISP. Conversely, your ISP cannot really help with your Office 365 performance problem if the issue is a long application wait time. Visibility is key to developing effective operational processes around Office 365, cut down your MTTT and MTTR, and minimize helpdesk costs.

HTTP Server Response Time graphs for Outlook
Figure 5: HTTP Server Response Time graphs for Outlook.

How Do You Troubleshoot Service Outages?

The combination of application layer and network layer data helps you pinpoint the root cause of Office 365 performance issues. You don’t have to guess, or rely on incomplete subjective data from the field to determine the scope of problems. For example, in Figure 6, we see a short-lived HTTP Server outage, which has been correlated with 100% network packet loss from one location (Berlin, Germany). ThousandEyes Path Visualization pinpoints the source of this packet loss as a transient error in the Microsoft network. If this error was to persist, you can generate a Snapshot Share Link and attach it to your Microsoft support ticket to help achieve quicker resolution. This is a win-win for all parties since it provides Microsoft with actionable data about the incident and eliminates all finger pointing.

Path Visualization with packet loss
Figure 6: Pinpoint root cause of application outages.

Cloud Readiness Lifecycle

There’s no such thing as steady state in the cloud. Thriving in an Office 365 environment requires a new approach to monitoring – one that factors in rapidly shifting baselines, and networks and applications that you don’t control. ThousandEyes recommends a continuous lifecycle approach to monitoring. Ensure readiness by monitoring your Office 365 applications before you begin rolling out to pilot users. Establish clear success criteria for your deployment, and develop new escalation processes amongst all your teams, and vendors. Continue to monitor and benchmark your app performance, and monitor end-user metrics, in order to get ahead of issues, and ensure a great user experience.

Cloud Readiness Lifecycle
Figure 7: Cloud Readiness Lifecycle.

Start monitoring your Office 365 applications today with the ThousandEyes 15-day free trial. As part of the trial, you will have access to Cloud Agents in 150+ cities around the globe that can give you immediate answers to your performance questions. You can also deploy Enterprise Agents inside your enterprise network to get a true inside-out perspective, and Endpoint Agents to understand metrics directly from your end-users.

For additional information, watch the video recording of our Monitoring Office 365 Best Practices webinar, reach out to us at @thousandeyes, @naik_ameet or request a demo.

Processing...