Cross-Layer Cloud Performance Monitoring, Analysis and Recovery

Abstract

The basic idea of Cloud computing is to offer software and hardware resources as services. These services are provided at different layers: Software (Software as a Service: SaaS), Platform (Platform as a Service: PaaS) and Infrastructure (Infrastructure as a Service: IaaS). In such a complex environment, performance issues are quite likely and rather the norm than the exception. Consequently, performance-related problems may frequently occur at all layers. Thus, it is necessary to monitor all Cloud layers and analyze their performance parameters to detect and rectify related problems. This thesis presents a novel cross-layer reactive performance monitoring approach for Cloud computing environments, based on the methodology of Complex Event Processing (CEP). The proposed approach is called CEP4Cloud. It analyzes monitored events to detect performance-related problems and performs actions to fix them. The proposal is based on the use of (1) a novel multi-layer monitoring approach, (2) a new cross-layer analysis approach and (3) a novel recovery approach. The proposed monitoring approach operates at all Cloud layers, while collecting related parameters. It makes use of existing monitoring tools and a new monitoring approach for Cloud services at the SaaS layer. The proposed SaaS monitoring approach is called AOP4CSM. It is based on aspect-oriented programming and monitors quality-of-service parameters of the SaaS layer in a non-invasive manner. AOP4CSM neither modifies the server implementation nor the client implementation. The defined cross-layer analysis approach is called D-CEP4CMA. It is based on the methodology of Complex Event Processing (CEP). Instead of having to manually specify continuous queries on monitored event streams, CEP queries are derived from analyzing the correlations between monitored metrics across multiple Cloud layers. The results of the correlation analysis allow us to reduce the number of monitored parameters and enable us to perform a root cause analysis to identify the causes of performance-related problems. The derived analysis rules are implemented as queries in a CEP engine. D-CEP4CMA is designed to dynamically switch between different centralized and distributed CEP architectures depending on the load/memory of the CEP machine and network traffic conditions in the observed Cloud environment. The proposed recovery approach is based on a novel action manager framework. It applies recovery actions at all Cloud layers. The novel action manager framework assigns a set of repair actions to each performance-related problem and checks the success of the applied action. The results of several experiments illustrate the merits of the reactive performance monitoring approach and its main components (i.e., monitoring, analysis and recovery). First, experimental results show the efficiency of AOP4CSM (very low overhead). Second, obtained results demonstrate the benefits of the analysis approach in terms of precision and recall compared to threshold-based methods. They also show the accuracy of the analysis approach in identifying the causes of performance-related problems. Furthermore, experiments illustrate the efficiency of D-CEP4CMA and its performance in terms of precision and recall compared to centralized and distributed CEP architectures. Moreover, experimental results indicate that the time needed to fix a performance-related problem is reasonably short. They also show that the CPU overhead of using CEP4Cloud is negligible. Finally, experimental results demonstrate the merits of CEP4Cloud in terms of speeding up the repair and reducing the number of triggered alarms compared to baseline methods

    Similar works