4 research outputs found

    Anomaly Detection in Cloud-Native systems

    Get PDF
    In recent years, microservices have gained popularity due to their benefits such as increased maintainability and scalability of the system. The microservice architectural pattern was adopted for the development of a large scale system which is commonly deployed on public and private clouds, and therefore the aim is to ensure that it always maintains an optimal level of performance. Consequently, the system is monitored by collecting different metrics including performancerelated metrics. The first part of this thesis focuses on the creation of a dataset of realistic time series with anomalies at deterministic locations. This dataset addresses the lack of labeled data for training of supervised models and the absence of publicly available data, in fact the data are not usually shared due to privacy concerns. The second part consists of an empirical study on the detection of anomalies occurring in the different services that compose the system. Specifically, the aim is to understand if it is possible to predict the anomalies in order to perform actions before system failures or performance degradation. Consequently, eight different classification-based Machine Learning algorithms were compared by collecting accuracy, training time and testing time, to figure out which technique might be most suitable for reducing system overload. The results showed that there are strong correlations between metrics and that it is possible to predict the anomalies in the system with approximately 90% of accuracy. The most important outcome is that performance-related anomalies can be detected by monitoring a limited number of metrics collected at runtime with a short training time. Future work includes the adoption of prediction-based approaches and the development of some tools for the prediction of anomalies in cloud native environments

    Fault Tolerant Multitenant Database Server Consolidation

    Get PDF
    Server consolidation is important in situations where a sequence of database tenants need to be allocated (hosted) dynamically on a minimum number of cloud server machines. Given a tenant’s load defined by the amount of resources that the tenant requires and a service-level- agreement (SLA) between the tenant customer and the cloud service provider, resource cost savings can be achieved by consolidating multiple database tenants on server machines. Ad- ditionally, in realistic settings, server machines might fail causing their tenants to become un- available. To address this, service providers place multiple replicas of each tenant on different servers and reserve extra capacity to ensure that tenant failover will not result in overload on any remaining server. The focus of this thesis is on providing effective strategies for placing tenants on server machines so that the SLA requirements are met in the presence of failure of one or more servers. We propose the Cube-Fit (CUBEFIT ) algorithm for multitenant database server consolidation that saves resource costs by utilizing fewer servers than existing approaches for analytical workloads. Additionally, unlike existing consolidation algorithms, CUBEFIT can tolerate multiple server failures while ensuring that no server becomes overloaded. We provide extensive theoretical analysis and experimental evaluation of CUBEFIT. We show that compared to existing algorithms, the average case and worst case behavior of CUBEFIT is superior and that CUBEFIT produces near-optimal tenant allocation when the number of tenants is large. Through evaluation and deployment on a cluster of up to 73 machines as well as through simulation stud- ies, we experimentally demonstrate the efficacy of CUBEFIT in practical settings

    Feature-based Time Series Analytics

    Get PDF
    Time series analytics is a fundamental prerequisite for decision-making as well as automation and occurs in several applications such as energy load control, weather research, and consumer behavior analysis. It encompasses time series engineering, i.e., the representation of time series exhibiting important characteristics, and data mining, i.e., the application of the representation to a specific task. Due to the exhaustive data gathering, which results from the ``Industry 4.0'' vision and its shift towards automation and digitalization, time series analytics is undergoing a revolution. Big datasets with very long time series are gathered, which is challenging for engineering techniques. Traditionally, one focus has been on raw-data-based or shape-based engineering. They assess the time series' similarity in shape, which is only suitable for short time series. Another focus has been on model-based engineering. It assesses the time series' similarity in structure, which is suitable for long time series but requires larger models or a time-consuming modeling. Feature-based engineering tackles these challenges by efficiently representing time series and comparing their similarity in structure. However, current feature-based techniques are unsatisfactory as they are designed for specific data-mining tasks. In this work, we introduce a novel feature-based engineering technique. It efficiently provides a short representation of time series, focusing on their structural similarity. Based on a design rationale, we derive important time series characteristics such as the long-term and cyclically repeated characteristics as well as distribution and correlation characteristics. Moreover, we define a feature-based distance measure for their comparison. Both the representation technique and the distance measure provide desirable properties regarding storage and runtime. Subsequently, we introduce techniques based on our feature-based engineering and apply them to important data-mining tasks such as time series generation, time series matching, time series classification, and time series clustering. First, our feature-based generation technique outperforms state-of-the-art techniques regarding the accuracy of evolved datasets. Second, with our features, a matching method retrieves a match for a time series query much faster than with current representations. Third, our features provide discriminative characteristics to classify datasets as accurately as state-of-the-art techniques, but orders of magnitude faster. Finally, our features recommend an appropriate clustering of time series which is crucial for subsequent data-mining tasks. All these techniques are assessed on datasets from the energy, weather, and economic domains, and thus, demonstrate the applicability to real-world use cases. The findings demonstrate the versatility of our feature-based engineering and suggest several courses of action in order to design and improve analytical systems for the paradigm shift of Industry 4.0