The complexity of systems is considered an obstacle to the progress of the IT industry. Autonomic computing is presented as the alternative to cope with the growing complexity. It is a holistic approach, in which the systems are able to configure, heal, optimize, and protect by themselves. Web-based applications are an example of systems where the complexity is high. The number of components, their interoperability, and workload variations are factors that may lead to performance failures or unavailability scenarios. The occurrence of these scenarios affects the revenue and reputation of businesses that rely on these types of applications. In this article, we present a self-healing framework for Web-based applications (SHõWA). SHõWA is composed by several modules, which monitor the application, analyze the data to detect and pinpoint anomalies, and execute recovery actions autonomously. The monitoring is done by a small aspect-oriented programming agent. This agent does not require changes to the application source code and includes adaptive and selective algorithms to regulate the level of monitoring. The anomalies are detected and pinpointed by means of statistical correlation. The data analysis detects changes in the server response time and analyzes if those changes are correlated with the workload or are due to a performance anomaly. In the presence of per- formance anomalies, the data analysis pinpoints the anomaly. Upon the pinpointing of anomalies, SHõWA executes a recovery procedure. We also present a study about the detection and localization of anomalies, the accuracy of the data analysis, and the performance impact induced by SHõWA. Two benchmarking applications, exercised through dynamic workloads, and different types of anomaly were considered in the study. The results reveal that (1) the capacity of SHõWA to detect and pinpoint anomalies while the number of end users affected is low; (2) SHõWA was able to detect anomalies without raising any false alarm; and (3) SHõWA does not induce a significant performance overhead (throughput was affected in less than 1%, and the response time delay was no more than 2 milliseconds)
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.