At ThousandEyes we are building the Endpoint Agent to measure end-user digital experience when accessing networked services from inside or outside the corporate network. The Endpoint Agent is a cross-platform system service, running on macOS and Windows, that conducts network tests and collects performance data from web browser sessions in real time. It subsequently processes, augments and uploads the obtained results to our backend for further processing and visualization. When designing the low-level network tests, we chose the Futures abstraction to deliver the final results to the client code. Futures provide an architecturally clean and robust mechanism for returning and consuming asynchronous results. The standard library’s “std::future” type, however, due to its limited API, proved to be an impediment for effectively parallelizing all our data operations end-to-end. After looking into alternative Future implementations and finding them unsuitable for our project, we decided to implement a library for attaching continuations to the standard “std::future” type. By evaluating our library, named “thousandeyes::futures”, against other more direct approaches for consuming results from Futures asynchronously, we confirmed that the proposed solution performs equally well in real-world use-cases while requiring an order of magnitude fewer threads.

Asynchronous Programming in Endpoint Agent

In general, apart from using non-blocking sockets, there are two approaches for performing multiple network operations in parallel and processing their results:

  • 1. creating a new thread for each end-to-end operation and block waiting for the results, or
  • 2. dispatch multiple network operations concurrently from one or a few threads and get notified by the system when a result is ready.

The first approach does not work very well and clearly does not scale. It does not work well because, eventually, in any non-trivial program, results from different operations will need to be combined and considered in/for subsequent operations. Sharing state between threads is expensive, error-prone and requires complex synchronization logic. Even if that was not the case, creating a new thread for each operation cannot scale since operating systems limit the number of threads per process. Not only is thread creation expensive but also, especially on machines under heavy load, do not perform well.

On the other hand, modern operating systems provide adequate support for performing network operations concurrently using one or only a few threads. The de-facto standard “boost::asio” library implements a performant, robust, cross-platform layer on top of the aforementioned system facilities that provides an asynchronous model for dispatching and handling concurrent network operations.

“Boost::asio” is ideal for dispatching and timing network operations for performing tests, since the callback mechanism it utilizes for notifying client code incurs very little latency. Nonetheless, building higher-level behaviors by depending on, combining and reusing intermediate results becomes increasingly more complex, error-prone and difficult to debug. This is mainly due to the separation in time and space between operation initiation and completion and the inverted control flow that inevitably complicate the program’s logic.

Therefore, when designing our low-level network tests, we decided (a) to use callbacks for implementing the low-level test’s logic and obtain all the required measurements and (b) to use Futures for reporting the final test results back to the client code. This can be condensed to the following rule of thumb: Callbacks for latency-sensitive operations and Futures for operations tolerant to higher latencies. For example, the interface of our ping test resembles the one below:

Then, the concrete implementation of the interface above, e.g., the “AsioPingTestService”, uses callbacks to dispatch and measure all the required network operations for producing the final “PingTestResult” asynchronously.

After obtaining a future, the client code can subsequently store it, check if it is “ready” or even block and wait until it becomes ready. From that point on we can forward the Future to higher level components and it is no longer necessary to invert our application’s control flow. Moreover, “std::future” is part of the standard library since C++11, so we can use this more well-suited abstraction even without adding any external dependencies to our project.

If this was the future, the standard library’s Future would have been perfect for all our needs. Unfortunately, the current implementation of “std::future” comes with its own set of limitations.

Using Standard and Third-party Futures

The C++11/14/17 standard library includes the “std::future” type for returning results asynchronously to client code. The API of “std::future”, however, is very limited and does not provide any support for attaching continuations. A continuation, in this context, is code (function) that gets associated with a specific “std::future” object and gets executed as soon as the latter becomes ready. This limitation makes it very difficult to use “std::future” extensively in projects, since it becomes increasingly difficult, tedious and error-prone to effectively parallelize and reuse components that need to consume, transform and combine multiple asynchronous results.

Let’s try, for example, to implement a compound network test that uses a “DnsTestService” that resolves a hostname to an IP address and the “PingTestService” of Figure 1. Then, we can proceed with the ping test only if the given hostname successfully resolves to one or more IP addresses. Ideally, we want the “runAllTests()” function to return the final result asynchronously. However, as it becomes obvious from the example in Figure 2, this is not possible:

Then, while we succeeded in implementing the low-level network tests (“DnsTestService” and “PingTestService”) asynchronously, it seems practically impossible to combine their results and produce the above function’s compound result asynchronously. Not only that but we also managed to effectively reverse all the benefits we had by having the low-level tests executing asynchronously. For the clients of “runAllTests()” it makes absolutely no difference if the aforementioned tests executed asynchronously or not.

Prior to implementing our own solution for handling the aforementioned issue, we evaluated a few existing libraries as potential alternatives to the “std::future” type. Despite their maturity, flexibility and feature-completeness they were not deemed as a good fit for the Endpoint Agent project.

First of all, the obvious candidate, the “boost::future” type was not a good fit since it relies on “boost::thread” library’s components to deliver its functionality. That, in turn, meant that our project, to use “boost::future” effectively, would have to replace many of the standard types such as “std::thread”, “std::mutex”, and “std::chrono” with their boost equivalents. The “folly::futures” library, created by Facebook, was also deemed not a good fit since it relied on many of the other folly sub-libraries and we did not want to add the whole folly project as a dependency to the Endpoint Agent.

Finally, other cross-platform open source Future implementations on GitHub were deemed not a good fit since they implemented their own, non-standard Future type and they did not provide support for using custom executors for monitoring Futures and dispatching continuations. Moreover, many of the libraries were not very mature and well-tested.

Due to the above, we decided to implement our own solution that enables client code to attach continuations to the standard “std::future” type. The resulting library, named “thousandeyes::futures”, manages to achieve this goal using only one thread for polling all the active “std::future” objects and another one for dispatching the continuations attached to the ready “std::future” objects.

Despite being unsuitable for our project, all the aforementioned Future libraries would have achieved similar performance and syntactic clarity as our own library. Then, the compound network test above would be able to provide a fully asynchronous implementation of the “runAllTests()” function by attaching continuations to the Futures returned by the individual low-level network tests. Specifically, the fully asynchronous implementation of a compound network test service using the “thousandeyes::futures” library would be the following:

The ThousandEyes-Futures Library

The “thousandeyes::futures” library, is a small, self-contained, cross-platform, header-only library with the following features:

  • Uses and operates on the standard, “std::future” type
  • Does not have any external dependencies
  • Does not require building; existing code just needs to include its headers
  • Is efficient and well-tested
  • Is easy to extend and support many different use-cases
  • Achieves good trade-offs between implementation simplicity, efficient use of resources and continuation dispatch latency

The following program attaches a continuation to the Future returned by the “getRandomNumber()” function. The attached continuation gets called only when the input Future is ready (i.e., the “future::get()” call is guaranteed to not block). Note that the algorithm for the asynchronous random number generator is based on this publication.

There are five concepts/aspects of the “thousandeyes::futures” library that can be seen in the above example:

  • 1. Creating an Executor
  • 2. Setting a concrete implementation of the Executor as the default executor instance
  • 3. Attaching continuations using the “thousandeyes::futures::then()” function
  • 4. Extracting the final value from the top-most “std::future” object (this is the only operation in the above example that blocks)
  • 5. Stopping an Executor

The Executor is the component responsible for waiting on Futures and dispatching continuations. The library provides a simple, default implementation of the Executor interface called “DefaultExecutor”. Clients of the library, however, may choose to implement and use their own executor(s) if the provided “DefaultExecutor” is not adequate for the project’s use-cases. The “DefaultExecutor” uses two “std::thread” threads: one thread that polls all the active “std::future” objects that have continuations attached to them and one that is used to invoke the continuations once the futures become ready. It polls the active “std::future” objects with a timeout, “q”, given in the component’s constructor.

Attaching continuations to “std::future” objects is achieved via the “thousandeyes::futures::then()” function. The main arguments the function accepts are the following:

  • The input Future object
  • The continuation function that accepts the input Future as its argument and can either return a value or a Future of a value

Inside the continuation function, the “std::future::get()” method never blocks, it either returns the stored result immediately or throws the stored exception depending on whether the original input future object became ready with a value or with an exception.

The return value of a “then()” expression is always a Future that becomes ready only after the continuation produces a final result. Specifically, if the continuation function’s return type is a value, then the return value of the “then()” function is a Future of that value. On the other hand, if the continuation function’s return type is a Future of a value, “then()” returns a new Future of the same type that becomes ready only when the first Future (returned by the continuation) becomes ready.

When the executor is explicitly stopped via its “stop()” method, it does the following: (a) makes all the pending “std::future” objects ready with an exception, (b) dispatches the continuations associated with those Futures, and (c) joins the threads that are used to poll and dispatch the Future objects.

Performance

After testing and benchmarking the default implementation (a) against other, more direct approaches for detecting when the set of active Futures becomes ready and (b) against other implementations of the Executor component, it appears that the “DefaultExecutor” with a “q” value of 10 ms achieves a good balance between efficient use of resources and raw performance.

The proposed, default implementation of “then()”, using the “DefaultExecutor”, was benchmarked against the following alternative implementations:

  • A “blocking_then()” implementation that eagerly calls “future::get()” and blocks to get the result before moving to the next future (serves as the baseline)
  • An “unbounded_then()” implementation that creates a new thread per-invocation that waits for the result via “future::wait()”
  • An “asio_then()” implementation that dispatches a function via “boost::asio::io_context::post()” per-invocation, which, in turn, waits for the result via “future::wait()” and uses 50 threads to run “boost::asio::io_context::run()”

Whereas the other approaches use many threads, the “thousandeyes::futures” library using the “DefaultExecutor” uses at most two threads. One thread for polling all the active Futures for completion status and one thread for dispatching the continuations. Despite the use of much fewer resources, all the above approaches, apart from the “blocking_then()” baseline, perform identically. More detailed information about the performance of the library can be found in the project’s README.

Conclusion

In this post we talked about the Futures abstraction and how it allows for cleaner, more structured asynchronous code. We saw, however, that the Future type provided by the C++ standard library is limited. Moreover, we found that alternative Future implementations carried many dependencies and were not a good fit for our project. Therefore, to alleviate the limitations of the standard Future type and to enable its more extensive use in asynchronous C++ code, we built and open sourced the “thousandeyes::futures” library. The “thousandeyes::futures” library allows for attaching continuations to the “std::future” objects and offers an extensible mechanism for monitoring Futures and dispatching continuations attached to them. At the core of this extensibility mechanism is the “Executor” interface that projects can implement to fully adapt the library to their own, unique use-cases.

Regardless of that, the existing “DefaultExecutor” with a “q” value of 10 ms appears to be a very good compromise between raw, real-world performance and resource utilization. In typical usage scenarios, where there will be a few hundred “std::future” instances active at any given time, mostly independent, the worst possible latency will only be a few seconds. The way the “thousandeyes::futures” library is currently used in the Endpoint Agent, a few seconds of latency is perfectly fine since the main goal is increasing the parallelization potential of the underlying system and not make measurements. The proposed library achieves that goal with very modest CPU and memory requirements.

In any case, “thousandeyes::futures” is open source and Pull Requests for improving the library or extending its scope are always welcome.

Subscribe to the
Network Intelligence Blog!
Subscribe
Back to ThousandEyes Blog