Most enterprises are in some way or the other on a cloud and digital transformation journey. The challenges that come with these highly visible and cost-intensive priority projects are numerous, from picking the right technology stack, to cloud architecture to service orchestration and network automation. However, an often overlooked aspect is the change needed in the operationalization of the cloud, especially from the perspective of monitoring, troubleshooting and fault handling. It’s important to understand the radically different connectivity environment that you rely on for cloud service delivery, develop realistic expectations around that environment, gain access to a monitoring data set that will help you operationally manage that service delivery, and adopt lifecycle processes that fit with the dynamic nature of the cloud and Internet. In other words, in the rush to the cloud, how do you ensure that you maintain operational visibility?
Challenges of Digital Transformation and Cloud Adoption
Nearly every business today is a digital business. And if not, there is most likely a clear path to getting there. Digital transformation initiatives range from providing seamless, omnichannel experience to retail customers to modernizing your workforce by heavily engaging in SaaS, to adopting the public cloud as your infrastructure platform. The common thread that binds all these initiatives, exclusive of increasing revenue, reducing operating costs and providing exceptional user-experience, is the common set of challenges that surround them. Let’s review a few of them.
A recent survey by EMA Research highlighted that ~ 60% of enterprises moving to the cloud are still struggling with performance management, network planning and security. While it is critical to choose the right cloud or decide the best SaaS platform for collaboration and communication it is equally critical to make sure these services are delivering to their promises and SLAs and most importantly, delivering superior user-experiences. So, why is that a challenge?
Well, because with digital transformation and cloud adoption you are exponentially increasing the numbers of dependencies you rely on – from DNS to CDN to third-party APIs to public cloud providers. When you start relying on entities that you don’t control, understanding how they perform can be well, challenging. In the cloud, the Internet has become the central nervous system for communication. When you rely on a network that is not built for enterprise communication and arguably has questionable security defenses in place, you are susceptible to its vulnerabilities. Having an online footprint means that there are security risks such as DDoS, BGP hijacks associated with it. Managing an interconnected web of unknowns is a struggle and an area enterprises moving to the cloud grapple with.
Why Traditional Monitoring Flatlines in the Cloud?
Enterprise IT teams are proficient and well accustomed to handling issues within their own network and domains they control. However, with the cloud, when boundary lines blur and reliance on multiple third-party networks and services increase, they find it hard to predict, understand and troubleshoot performance issues.
For instance, within a traditional enterprise environment, where you own the application and the network and also host it within your own data center, then the monitoring stack shown in Figure 1 is sufficient to detect and identify problems. For instance, SNMP polling can help detect device failures or flow data can be analyzed to understand bandwidth overload issues for instance. When you own the application, APM techniques such as synthetic performance and availability monitoring, and profiling techniques that detail internal function calls and code injection can help you understand end-user performance.
All these techniques have their place, but they aren’t enough when you move to the cloud. Let’s consider what happens when you move applications to a public cloud environment. You might own the application code hosted within a public cloud provider like AWS, but you have no control over the infrastructure and networking scheme. So APM techniques like code injection continue to work well within the public cloud as apps and microservices are built to be agnostic to the underlying infrastructure. However, techniques like packet capture and SNMP lose much of their utility since network and infrastructure is so highly abstracted. What about the case with SaaS, where you own neither the infrastructure or the software? In this case, neither APM code injection nor SNMP, packet capture, and flow data are relevant.
Mind the Gap in Traditional Monitoring Stacks
Monitoring for the cloud era needs to evolve to take into consideration the impact of external dependencies, the Internet, cloud environments and SaaS apps, in order to provide a holistic view of how services are being delivered to the end-user. Unfortunately, if you’re relying on the monitoring stack you built for the pre-cloud world, you’ve got a huge gap in visibility around the external components of delivering digital experiences to customers and employees. Public cloud vendors address a part of the problem by providing access to flow logs and infrastructure health within your VPC environment. However, that still does not address the performance of external service providers like DNS, CDN, cloud-based security and SaaS, and Internet transport for SD-WANs.
What’s At Stake?
It may be somewhat obvious that a loss of visibility is not a good thing, but it’s important to explore the impact of this new visibility gap not just technically but from a business angle. From a technical point of view, the loss of visibility into your cloud and Internet ecosystem leads to issues like poor baselining, messy or misconfigured alerting, and long to sometimes infinite MTTR. The problem is that when you lose control over IT assets, the burden of proof ironically rises on the IT team to be able to simply do fault isolation. If you don’t have sufficient visibility, how will you figure out whether the source of a cloud issue is internal, an ISP, a SaaS vendor, etc? In other words, which provider needs an escalation? Further, without a good amount of diagnostic data, you’ll be hard pressed to get a provider to effectively act on your escalation, since they may not be convinced it’s their problem.
From a business point of view, the impact of loss of control and visibility means exposure to significant risk of damage to revenues, brand reputation, employee productivity and engagement. In a sense, when you move to the cloud, you’ve opened up a whole new world of possibilities but also of performance and security risks (like BGP and DNS hijacking). If you’re flying blind, you’re headed for trouble, and that’s no way to run a business.
Operational Visibility to De-Risk Your Cloud
ThousandEyes with its fleet of global vantage points and network intelligence principles bridges this newly formed visibility gap for your Internet and cloud ecosystem. Redesigning the IT monitoring stack as shown in Figure 3, is not so much about replacing traditional monitoring techniques, but refactoring your investments to address the significant new challenges and risks in the cloud, so that you can effectively capitalize on all the cloud has to offer to your organization both technically and from a business agility perspective.
Want to see how our customers use ThousandEyes as a part of their modern ITOps stack? Watch David Mann, Senior Director of Global Network Services discuss the company’ digital transformation journey and how a revised IT operations stack and visibility framework powered by ThousandEyes was critical in delivering superior learning experiences to their customers. If you are interested in learning more about how ThousandEyes is best suited for your cloud environment, request a demo.