A Framework for Preprocessing Multivariate, Topology-Aware Time Series and Event Data in a Multi-System Environment

Andreas Schorgenhumer, Mario Kahlhofer, Peter Chalupar, Paul Grunbacher, Hanspeter Mossenböck

2019 IEEE 19th International Symposium on High Assurance Systems Engineering (HASE), 2019

Monitoring and predicting quality properties of complex systems relies on collecting and analyzing huge amounts of data at run time. Machine learning is frequently adopted to analyze time series and event data, often coming from multiple systems. In such a context, extracting and preprocessing data is an essential but also highly tedious task. In this paper, we thus present an offline preprocessing framework that can handle multivariate time series and event data in a multisystem environment that also takes the system's topology into account. After a discussion of the key requirements, we present the architecture and implementation of our highly configurable and easy-to-use framework. We demonstrate how the framework allows to extract data and to yield output files for machine learning via configuration settings. In a two-step evaluation, we investigate the framework's usefulness and scalability. We demonstrate the usefulness in an event prediction case study of real-world multi-system time series data. Our results show the significant impact of different data preprocessing settings on machine learning. Our experiments further demonstrate that processing performance scales linearly with respect to the number of systems and time series.