Publication | Publications - Systems Group, ETH Zurich

Systems Group Master's Thesis, no. 154; Department of Computer Science, September 2016
Supervised by: Prof. Timothy Roscoe

Identifying performance bottlenecks of modern, distributed stream processing engines is a serious challenge. At the same time, such engines are widely used to perform data-parallel tasks such as machine learning, graph processing or sophisticated streaming data analysis. Oftentimes, these tasks are required to produce low-latency results as well as to achieve a high throughput. As computations often consist of complex, iterative dataflows and are distributed over multiple physical machines — with hundreds of worker threads in total — finding the source of a performance problem is a difficult task. While profiling can be used to quantify the time spent in the various steps of a parallel computation, it does not take into account the dependencies between the steps. As a result, optimization efforts are often wasted on components that have little to no influence on query latency or the overall runtime of a program. In this work, we offer a more effective alternative by applying critical path analysis, a dependency-aware technique. The critical path is defined as the longest sequence of dependent steps in a parallel program’s execution. Any increase in the execution time of a step on the critical path will therefore result in an equal increase in the total runtime of the computation. We refine existing critical path-based models and apply them to data-parallel systems, which often share common low-level principles. We provide guidelines on the instrumentation necessary to apply our model, as well as a set of trace properties that help verify the correctness of that instrumentation. Furthermore, we develop a novel method to identify phases in a worker thread’s execution during which it is waiting — e.g. for a message from a different worker—even in the absence of blocking system calls. Through critical path analysis, we can then identify performance bottlenecks in system components, dataflow operators as well as in network communication. To demonstrate our ideas, we implemented a prototype system capable of performing a critical path analysis of the Timely Dataflow stream processing engine. We show that our system can effectively identify the factors limiting a data-parallel computation’s overall performance. Furthermore, we demonstrate that our analysis is both efficient and scalable, and can even be performed in real-time in certain configurations.

@mastersthesis{abc,
abstract = {Identifying performance bottlenecks of modern, distributed stream processing engines is a serious challenge. At the same time, such engines are widely used to perform data-parallel tasks such as machine learning, graph processing or sophisticated streaming data analysis. Oftentimes, these tasks are required to produce low-latency results as well as to achieve a high throughput. As computations often consist of complex, iterative dataflows and are distributed over multiple physical machines {\textemdash} with hundreds of worker threads in total {\textemdash} finding the source of a performance problem is a difficult task. While profiling can be used to quantify the time spent in the various steps of a parallel computation, it does not take into account the dependencies between the steps. As a result, optimization efforts are often wasted on components that have little to no influence on query latency or the overall runtime of a program.
In this work, we offer a more effective alternative by applying critical path analysis, a dependency-aware technique. The critical path is defined as the longest sequence of dependent steps in a parallel program{\textquoteright}s execution. Any increase in the execution time of a step on the critical path will therefore result in an equal increase in the total runtime of the computation. We refine existing critical path-based models and apply them to data-parallel systems, which often share common low-level principles. We provide guidelines on the instrumentation necessary to apply our model, as well as a set of trace properties that help verify the correctness of that instrumentation. Furthermore, we develop a novel method to identify phases in a worker thread{\textquoteright}s execution during which it is waiting {\textemdash} e.g. for a message from a different worker{\textemdash}even in the absence of blocking system calls. Through critical path analysis, we can then identify performance bottlenecks in system components, dataflow operators as well as in network communication.
To demonstrate our ideas, we implemented a prototype system capable of performing a critical path analysis of the Timely Dataflow stream processing engine. We show that our system can effectively identify the factors limiting a data-parallel computation{\textquoteright}s overall performance. Furthermore, we demonstrate that our analysis is both efficient and scalable, and can even be performed in real-time in certain configurations.},
author = {Ralf Sager},
school = {154},
title = {Real-Time Performance Analysis of a Modern Data-Parallel Stream Processing Engine},
year = {2016}
}