An Approach for Ranking Feature-based Clustering Methods and its Application in Multi-System Infrastructure Monitoring

Andreas Schörgenhumer; Thomas Natschläger; Paul Grünbacher; Mario Kahlhofer; Peter Chalupar; Hanspeter Mössenböck

2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA), 2021

Companies need to collect and analyze time series data to continuously monitor the behavior of software systems during operation, which can in turn be used for performance monitoring, anomaly detection or identifying problems after system crashes. However, gaining insights into common data patterns in time series is challenging, in particular, when analyzing data concerning different properties and from multiple systems. Clustering approaches have been hardly studied in the context of monitoring data, despite their possible benefits. In this paper, we present a feature-based approach to identify clusters in unlabeled infrastructure monitoring data collected from multiple independent software systems. We introduce time series properties which are grouped into feature sets and combine them with various unsupervised machine learning models to find the methods best suited for our clustering goal. We thoroughly evaluate our approach using two large-scale, industrial monitoring datasets. Finally, we apply one of the top-ranked methods to thousands of time series from hundreds of software systems, thereby showing the usefulness of our approach.