Network Monitoring Guide: Choosing Tests and Metrics

Posted by on February 5, 2015

At ThousandEyes we often get asked which network performance metrics are best. There’s no one correct answer, but here is a quick guide to navigating network performance metrics and their associated tests.

There are several performance characteristics that you may be interested in:

  • Web page: Are elements of a page or set of pages slow?
  • Web server: Can users reach my service in a reasonable time?
  • Voice: Is call quality meeting SLAs? Is DiffServ handled correctly?
  • Network: What network issues exist? Where are issues located?
  • Routing: Can traffic reach the correct network? Is the path optimal?
  • DNS: Can my URL quickly resolve? Are the DNS records correct?

Figure 1: Decision tree for choosing ThousandEyes tests and metrics.
Figure 1: Decision tree for choosing ThousandEyes tests and metrics.

Web Page

Web page performance is useful for a number of purposes. Set a baseline over time to determine the effect of page changes. Benchmark across source locations or against competitors to reveal opportunities for improvement. A Page Load test measures the objects on a page and displays them in a waterfall. A Transaction test follows a specific set of page views and actions, recording the total time required.

Page load metrics:

  • DOM load, page load and total response times
  • Wire size and throughput
  • Timing per object by provider, network and connections stage

Transaction metrics:

  • Completion rate
  • Transaction time per step

Figure-2
Figure 2: Page load metrics per object by provider.

Web Server

Service availability is important to understand whether an application is up or down, how long it takes to respond and what type of error is contributing to any failure. An HTTP Server test targets a web service, attempts a DNS lookup, TCP connection, SSL connection and an HTTP connection. The HTTP layer is included in Page Load tests.

Service availability metrics:

  • Service availability
  • Response time (time to first byte) and fetch time (time to last byte)
  • Errors and timing by connection stage (DNS, Connect, SSL, Wait Receive)

Figure-3
Figure 3: Errors and timing by connection stage.

Voice

Voice performance is highly susceptible to high levels of latency and high variation in latency. Voice tests measure call quality with the Mean Opinion Score (MOS) and its constituent metrics: loss, discards, latency and Packet Delay Variation. You can configure the Differentiated Services (DSCP) field and voice codec of the test to match your environment.

Voice metrics:

  • Mean opinion score (MOS)
  • Loss
  • Discards
  • Latency
  • Packet Delay Variation (PDV)

Figure 4: VoIP metrics.
Figure 4: VoIP metrics.

Network

Network performance is typically measured by the success of IP forwarding, the time and variation for packets to make it to the destination, and the theoretical capacity and actual bandwidth available. Network tests measure these key performance indicators and reveal where in the network packet loss, latency and jitter occur. The Network layer can be included in Page Load, HTTP, Voice and DNS Server tests.

Network End-to-End metrics:

  • Packet loss
  • Latency and jitter
  • Bandwidth (more info)
  • Undersized Path MTU and oversized TCP MSS (more info)

Figure 5: Network end-to-end metrics.
Figure 5: Network end-to-end metrics.

Routing

Monitoring routing is important to understand reachability and network paths that can affect network performance, leading to high packet loss or latency. Use a BGP test to measure BGP updates, path changes and reachability from networks around the world. Or monitor reachability from your own network using a BGP Private Peer. The BGP layer can be included in Page Load, HTTP, Voice, Network and DNS Server tests.

BGP metrics:

  • Path changes and Updates
  • Reachability (more info)

Figure 6: Reachability metrics to Netflix from London.
Figure 6: Reachability metrics to Netflix from London.

DNS

DNS monitoring is useful to understand the availability and accuracy of DNS records. Availability is captured by success and time required for a trace from the top level domain (TLD) down to the authoritative servers. Accuracy is reflected by whether the correct mappings exist for each record.

DNS Trace tests measure the availability of DNS servers from the top level domain (TLD) down to the authoritative server. DNS Server tests measure the availability and accuracy of authoritative servers or caching servers (by deselecting the recursive option). DNSSEC tests measure whether the DNSSEC protocol is valid for a domain.

DNS metrics:

  • Queries and query time
  • Availability and resolution time
  • Mappings
  • Errors
  • DNSSEC Validity

Figure 7: DNS Trace record availability and query time.
Figure 7: DNS Trace record availability and query time.

Selecting the Right Tests in ThousandEyes

Based on your monitoring and troubleshooting objectives, choosing the correct Test in ThousandEyes is easy. The Test Settings view makes it easy to select exactly what you want to monitor. In Figure 7 you can see how to select your desired test using the Layers and Test Types buttons.

Figure 8: Use the Test Settings view to choose new tests and data views.
Figure 8: Use the Test Settings view to choose new tests and data views.

On the right you’ll find the data views that are be available for the selected test; read more about how how tests and data views are correlated in our blog on X-Layer. Below the test settings are additional configuration options for testing intervals, agent locations and alerts.

Start monitoring your network performance within minutes using one of the above tests by signing up for our free version of ThousandEyes.

Processing...