Using Crash Frequency Analysis to Identify Error-Prone Software Technologies in Multi-System Monitoring

Andreas Schörgenhumer, Mario Kahlhofer, Hanspeter Mössenböck, Paul Grünbacher

2018 IEEE International Conference on Software Quality, Reliability and Security (QRS), 2018

Faults are common in large software systems and must be analyzed to prevent future failures such as system outages. Due to their sheer amount, the observed failures cannot be inspected individually but must be automatically grouped and prioritized. An open challenge is to find similarities in failures across different systems. We propose a novel approach for identifying error-prone software technologies via a cross-system analysis based on monitoring and crash data. Our approach ranks the error-prone software technologies and analyzes the occurred exceptions, thus making it easier for developers to investigate cross-system failures. Finding such failures is highly advantageous as fixing a fault may benefit many affected systems. A preliminary case study on monitoring data of hundreds of different systems demonstrates the feasibility of our approach.