2 research outputs found

    Automated Anomaly Detection and Localization System for a Microservices Based Cloud System

    Get PDF
    Context: With an increasing number of applications running on a microservices-based cloud system (such as AWS, GCP, IBM Cloud), it is challenging for the cloud providers to offer uninterrupted services with guaranteed Quality of Service (QoS) factors. Problem Statement: Existing monitoring frameworks often do not detect critical defects among a large volume of issues generated, thus affecting recovery response times and usage of maintenance human resource. Also, manually tracing the root causes of the issues requires a significant amount of time. Objective: The objective of this work is to: (i) detect performance anomalies, in real-time, through monitoring KPIs (Key Performance Indicators) using distributed tracing events, and (ii) identify their root causes. Proposed Solution: This thesis proposes an automated prediction-based anomaly detection and localization system, capable of detecting performance anomalies of a microservice using machine learning techniques, and determine their root-causes using a localization process. Novelty: The originality of this work lies in the detection process that uses a novel ensemble of a time-series forecasting model and three different unsupervised learning techniques that avoid defining static error thresholds to detect an anomaly and, instead follow a dynamic approach. Experimental Results: The proposed detection system was experimented using different variants of ensembles, evaluated on a real-world production dataset out of which two proposed ensembles outperformed the existing static rule-based approach with average F1-scores of 86% and 84%, average precision scores of 82% and 77% and average recall scores of 91% and 93% respectively across 6 experiments. The proposed detection ensembles were also evaluated on the Numenta Anomaly Benchmark (NAB) datasets and results show that the proposed method performs better than the Numenta’s standard HTM model score. Research Methodology: We adopted an agile methodology to conduct our research in an incremental and iterative fashion. Conclusion: The two proposed ensembles for anomaly detection perform better than the existing static rule-based approach

    Service level agreement specification for IoT application workflow activity deployment, configuration and monitoring

    Get PDF
    PhD ThesisCurrently, we see the use of the Internet of Things (IoT) within various domains such as healthcare, smart homes, smart cars, smart-x applications, and smart cities. The number of applications based on IoT and cloud computing is projected to increase rapidly over the next few years. IoT-based services must meet the guaranteed levels of quality of service (QoS) to match users’ expectations. Ensuring QoS through specifying the QoS constraints using service level agreements (SLAs) is crucial. Also because of the potentially highly complex nature of multi-layered IoT applications, lifecycle management (deployment, dynamic reconfiguration, and monitoring) needs to be automated. To achieve this it is essential to be able to specify SLAs in a machine-readable format. currently available SLA specification languages are unable to accommodate the unique characteristics (interdependency of its multi-layers) of the IoT domain. Therefore, in this research, we propose a grammar for a syntactical structure of an SLA specification for IoT. The grammar is based on a proposed conceptual model that considers the main concepts that can be used to express the requirements for most common hardware and software components of an IoT application on an end-to-end basis. We follow the Goal Question Metric (GQM) approach to evaluate the generality and expressiveness of the proposed grammar by reviewing its concepts and their predefined lists of vocabularies against two use-cases with a number of participants whose research interests are mainly related to IoT. The results of the analysis show that the proposed grammar achieved 91.70% of its generality goal and 93.43% of its expressiveness goal. To enhance the process of specifying SLA terms, We then developed a toolkit for creating SLA specifications for IoT applications. The toolkit is used to simplify the process of capturing the requirements of IoT applications. We demonstrate the effectiveness of the toolkit using a remote health monitoring service (RHMS) use-case as well as applying a user experience measure to evaluate the tool by applying a questionnaire-oriented approach. We discussed the applicability of our tool by including it as a core component of two different applications: 1) a contextaware recommender system for IoT configuration across layers; and 2) a tool for automatically translating an SLA from JSON to a smart contract, deploying it on different peer nodes that represent the contractual parties. The smart contract is able to monitor the created SLA using Blockchain technology. These two applications are utilized within our proposed SLA management framework for IoT. Furthermore, we propose a greedy heuristic algorithm to decentralize workflow activities of an IoT application across Edge and Cloud resources to enhance response time, cost, energy consumption and network usage. We evaluated the efficiency of our proposed approach using iFogSim simulator. The performance analysis shows that the proposed algorithm minimized cost, execution time, networking, and Cloud energy consumption compared to Cloud-only and edge-ward placement approaches
    corecore