At ThousandEyes, we talk about the importance of adopting a readiness lifecycle when dealing with cloud and Internet adoption. This is especially true of SD-WAN deployments because the goal of SD-WAN is to achieve better cloud application performance and cost savings by heavily augmenting or replacing the use of MPLS within enterprise network infrastructure with Internet access and connectivity. Adopting more Internet means that you’re exposing enterprise networking to a highly unpredictable environment. The only way you can ensure success is to have a data-driven process. ThousandEyes visibility is at the center of this process for many Fortune 500 and Global 2000 companies, who rely on our multi-layered insights to ensure successful SD-WAN deployments.
Know Before You Roll Out
To recap the obvious, WAN connectivity in most wide area networks means that branch offices are connected via MPLS circuits to centralized data centers that also serve as centralized hubs for public Internet break out and accompanying network security infrastructure (firewalls, etc.). MPLS comes with edge-to-edge SLAs so you know what you’re supposed to get on a contractual basis. Of course, the problem with this model is that as organizations shift applications to public cloud providers, or stop building their own applications and starting using SaaS, that backhauling traffic to central hubs makes less sense because cloud and SaaS providers build their Points of Presence (PoPs) to reach many points in the Internet with low latency. So, backhauling tends to degrade user experience while also paying for more costly MPLS transport.
Moving to Direct Internet Access (DIA) from branch offices is an appropriate way to respond, and is a central tenet of SD-WAN architecture for most organizations. But at this point of network design is where you start to need data. Before you choose your SD-WAN solution, before you settle on service providers, before you finalize your architecture and deployment plans, and certainly long before you cut over SD-WAN technology to production use, you need to measure for success.
Why do you need to gather data so early? Well, if like most enterprises, you’ve been using Internet network services mostly as rarely used backup for your MPLS WAN, then you probably have never put much weight on it. How do you know if your ISP will perform? For that matter, how do you know if your on-premises branch LAN or Wifi or older model Cisco or other branch router is ready to handle that much Internet traffic? After all, SaaS applications are much more web UI heavy than most in-house hosted applications. You need to gather performance data before you roll out to production.
What performance data should you gather? You should start first by measuring user experience for all the end user use cases you’re anticipating, including top priority in-house applications, your most important apps running in AWS, Azure, GCP, etc. and critical SaaS apps like Office 365, Salesforce and Workday. Don’t forget VoIP and collaboration tools like Webex and Zoom. Start by measuring how these apps and services perform over your current MPLS network as a baseline. To interpret data collection on most applications, start running layered Page Load, HTTP, end-to-end network and BGP routing visibility monitoring. For mission-critical use cases, run Internet-aware synthetic transaction monitoring tests. Get transaction performance, response time, availability, and network latency, packet loss and jitter statistics in hand for MPLS and you’re ready to rock to the next stage of measurement.
Pilot Testing, Baseline Comparisons and Provider Remediation
A really important readiness step that we’ve seen many organizations take is to do readiness audits of branch offices, key data centers local ISPs for branch DIA to cloud services and SD-WAN backhaul to internal applications. What this means is that before you place a single real user on it, start setting up your SD-WAN architectures at pilot sites and data centers and measure how all those apps and services will work from a branch and end user device point of view. You can use ThousandEyes Enterprise Agents and Endpoint Agents to do this measurement. If you’re contemplating using a cloud-based security service like Zscaler, fold that into the mix and measure it as part of your pilot too.
With ThousandEyes, you can not only get the app and overlay performance to compare with MPLS, but crucially, you can see the end-to-end and hop-by-hop performance of the underlay Internet transport, as well as BGP routing paths, along with changes and stability issues.
A good way to start is to define a set of audit activities and goals, as seen in Figure 1.
As you collect this data, set up side-by-side dashboards for key apps and services and see how they run comparatively on SD-WAN vs MPLS. As you can see in Figure 2, one of our customers found that in most cases, Internet connectivity worked as well or better than MPLS and could deliver high performance for critical applications from remote locations.
During this same phase, start to collect data on how ISPs are performing. We’ve seen customers catch dramatic performance, stability and availability issues over a 30 day testing period. For example, Figure 3 shows two ISPs that were serving a branch office in Asia. Once was showing significantly poorly versus the other. Examination of ThousandEyes paths and BGP routing data showed that they were routinely routing traffic in a very circuitous manner. The customer was able to remediate that issue using interactive sharelinks from ThousandEyes.
Set up monitoring to critical SaaS applications that goes through your cloud-based security provider PoP as well as monitoring tests that bypass the PoP so you can see how cloud-based proxies will affect performance and to make architecture and security decisions. Figure 4 shows such a comparison of Veeva going through Zscaler versus going straight from the branch to the Veeva service.
Prepare a readiness audit report for your steering committee or other stakeholders so they can see how SD-WAN will perform for the business and support digital transformation initiatives. Showcase baseline and comparative performance from remote sites via SD-WAN to prove the viability and promise of the new WAN edge infrastructure investment you’re planning. Share key findings and results such as those in Figure 5.
Run Smarter SD-WAN Solution and SD-WAN Services PoCs
In many cases, you will evaluate multiple SD-WAN vendors or providers. The same visibility you gain for readiness can be used to ensure a clean and fair proof of concept comparison of solutions and services. Having a third party view of how user experience, overlay and underlay are all working together helps you make the most data-driven decision you can. We’ve seen more and more customers leverage ThousandEyes visibility to help with these sorts of evaluations.
Establish Operational Visibility Before Roll Out
Once you’ve settled on architecture, multi-cloud attachments, SD-WAN vendor or SD-WAN services provider, you’re almost ready to deploy. But before you do, make sure you have the operational visibility in place. Use your baseline and comparison findings to set internal KPIs and alerting rules. Use our out of the box integrations with ServiceNow, Slack and PagerDuty or work with our professional services on other integrations to operationalize alerting and API data from the ThousandEyes platform. Be ready to use ThousandEyes sharelinks for effective escalations to ISPs, cloud security providers and cloud or SaaS providers. Create self-service dashboards for internal stakeholders so they can see how their critical applications are performing in the transition. Now you’re ready for a smooth and well-executed deployment and efficient operations in production.
Don’t Put SD-WAN Deployments at Risk
SD-WAN is one of the most consequential network architecture changes you will make. Don’t proceed without a data-driven process to reduce performance risk. ThousandEyes visibility provides the foundation for a full readiness lifecycle. And remember, in the cloud there’s no such thing as a steady state. Once you finish this change, more changes will be knocking at the door, so use that same data-driven readiness lifecycle model to ensure success in every cloud, SaaS, Internet and WAN move you make.