A systematic mapping of performance in distributed stream processing systems

Euromicro Conference on Software Engineering and Advanced Applications, 2023

Several software systems are built upon stream processing architectures to process large amounts of data in near real-time. Today's distributed stream processing systems (DSPSs) spread the processing among multiple machines to provide scalable performance. However, high-performance and Quality of Service (QoS) in distributed stream processing are challenging to predict, achieve, and maintain. While many studies focus on evaluating or improving the performance of stream processing, getting a comprehensive view of the current state of DSPSs and their performance in real-world deployments is challenging. In this paper, we present a systematic mapping study of the literature on DSPSs' performance. We discuss existing challenges, the most used DSPSs, achieved performance, and future trends. Our results demonstrate that performance is still one of the major concerns in stream processing, with several solutions available and different outcomes regarding the metrics, execution environments, and use cases considered. Moreover, there is a need for better benchmarks and workloads as well as for performance improvements by increasing efficiency and utilizing modern hardware. Our study intends to help software engineering practitioners and researchers to understand how to choose the most suitable DSPS to build efficient data-intensive architectures.