2 research outputs found
Automated Anomaly Detection and Localization System for a Microservices Based Cloud System
Context: With an increasing number of applications running on a microservices-based cloud system (such as AWS, GCP, IBM Cloud), it is challenging for the cloud providers to offer uninterrupted services with guaranteed Quality of Service (QoS) factors. Problem Statement: Existing monitoring frameworks often do not detect critical defects among a large volume of issues generated, thus affecting recovery response times and usage of maintenance human resource. Also, manually tracing the root causes of the issues requires a significant amount of time. Objective: The objective of this work is to: (i) detect performance anomalies, in real-time, through monitoring KPIs (Key Performance Indicators) using distributed tracing events, and (ii) identify their root causes. Proposed Solution: This thesis proposes an automated prediction-based anomaly detection and localization system, capable of detecting performance anomalies of a microservice using machine learning techniques, and determine their root-causes using a localization process. Novelty: The originality of this work lies in the detection process that uses a novel ensemble of a time-series forecasting model and three different unsupervised learning techniques that avoid defining static error thresholds to detect an anomaly and, instead follow a dynamic approach. Experimental Results: The proposed detection system was experimented using different variants of ensembles, evaluated on a real-world production dataset out of which two proposed ensembles outperformed the existing static rule-based approach with average F1-scores of 86% and 84%, average precision scores of 82% and 77% and average recall scores of 91% and 93% respectively across 6 experiments. The proposed detection ensembles were also evaluated on the Numenta Anomaly Benchmark (NAB) datasets and results show that the proposed method performs better than the Numenta’s standard HTM model score. Research Methodology: We adopted an agile methodology to conduct our research in an incremental and iterative fashion. Conclusion: The two proposed ensembles for anomaly detection perform better than the existing static rule-based approach
Service level agreement specification for IoT application workflow activity deployment, configuration and monitoring
PhD ThesisCurrently, we see the use of the Internet of Things (IoT) within various domains
such as healthcare, smart homes, smart cars, smart-x applications, and smart
cities. The number of applications based on IoT and cloud computing is projected
to increase rapidly over the next few years. IoT-based services must meet
the guaranteed levels of quality of service (QoS) to match users’ expectations.
Ensuring QoS through specifying the QoS constraints using service level agreements
(SLAs) is crucial. Also because of the potentially highly complex nature
of multi-layered IoT applications, lifecycle management (deployment, dynamic
reconfiguration, and monitoring) needs to be automated. To achieve this it is
essential to be able to specify SLAs in a machine-readable format.
currently available SLA specification languages are unable to accommodate
the unique characteristics (interdependency of its multi-layers) of the IoT domain.
Therefore, in this research, we propose a grammar for a syntactical structure
of an SLA specification for IoT. The grammar is based on a proposed conceptual
model that considers the main concepts that can be used to express the requirements
for most common hardware and software components of an IoT application
on an end-to-end basis. We follow the Goal Question Metric (GQM) approach to
evaluate the generality and expressiveness of the proposed grammar by reviewing
its concepts and their predefined lists of vocabularies against two use-cases
with a number of participants whose research interests are mainly related to IoT.
The results of the analysis show that the proposed grammar achieved 91.70% of
its generality goal and 93.43% of its expressiveness goal.
To enhance the process of specifying SLA terms, We then developed a toolkit
for creating SLA specifications for IoT applications. The toolkit is used to simplify
the process of capturing the requirements of IoT applications. We demonstrate
the effectiveness of the toolkit using a remote health monitoring service (RHMS)
use-case as well as applying a user experience measure to evaluate the tool by
applying a questionnaire-oriented approach. We discussed the applicability of our
tool by including it as a core component of two different applications: 1) a contextaware
recommender system for IoT configuration across layers; and 2) a tool for
automatically translating an SLA from JSON to a smart contract, deploying it
on different peer nodes that represent the contractual parties. The smart contract
is able to monitor the created SLA using Blockchain technology. These two
applications are utilized within our proposed SLA management framework for IoT.
Furthermore, we propose a greedy heuristic algorithm to decentralize workflow
activities of an IoT application across Edge and Cloud resources to enhance
response time, cost, energy consumption and network usage. We evaluated the
efficiency of our proposed approach using iFogSim simulator. The performance
analysis shows that the proposed algorithm minimized cost, execution time, networking,
and Cloud energy consumption compared to Cloud-only and edge-ward
placement approaches