187 research outputs found

    Anomaly Detection in Cloud-Native systems

    Get PDF
    In recent years, microservices have gained popularity due to their benefits such as increased maintainability and scalability of the system. The microservice architectural pattern was adopted for the development of a large scale system which is commonly deployed on public and private clouds, and therefore the aim is to ensure that it always maintains an optimal level of performance. Consequently, the system is monitored by collecting different metrics including performancerelated metrics. The first part of this thesis focuses on the creation of a dataset of realistic time series with anomalies at deterministic locations. This dataset addresses the lack of labeled data for training of supervised models and the absence of publicly available data, in fact the data are not usually shared due to privacy concerns. The second part consists of an empirical study on the detection of anomalies occurring in the different services that compose the system. Specifically, the aim is to understand if it is possible to predict the anomalies in order to perform actions before system failures or performance degradation. Consequently, eight different classification-based Machine Learning algorithms were compared by collecting accuracy, training time and testing time, to figure out which technique might be most suitable for reducing system overload. The results showed that there are strong correlations between metrics and that it is possible to predict the anomalies in the system with approximately 90% of accuracy. The most important outcome is that performance-related anomalies can be detected by monitoring a limited number of metrics collected at runtime with a short training time. Future work includes the adoption of prediction-based approaches and the development of some tools for the prediction of anomalies in cloud native environments

    Monitoring and analysis system for performance troubleshooting in data centers

    Get PDF
    It was not long ago. On Christmas Eve 2012, a war of troubleshooting began in Amazon data centers. It started at 12:24 PM, with an mistaken deletion of the state data of Amazon Elastic Load Balancing Service (ELB for short), which was not realized at that time. The mistake first led to a local issue that a small number of ELB service APIs were affected. In about six minutes, it evolved into a critical one that EC2 customers were significantly affected. One example was that Netflix, which was using hundreds of Amazon ELB services, was experiencing an extensive streaming service outage when many customers could not watch TV shows or movies on Christmas Eve. It took Amazon engineers 5 hours 42 minutes to find the root cause, the mistaken deletion, and another 15 hours and 32 minutes to fully recover the ELB service. The war ended at 8:15 AM the next day and brought the performance troubleshooting in data centers to world’s attention. As shown in this Amazon ELB case.Troubleshooting runtime performance issues is crucial in time-sensitive multi-tier cloud services because of their stringent end-to-end timing requirements, but it is also notoriously difficult and time consuming. To address the troubleshooting challenge, this dissertation proposes VScope, a flexible monitoring and analysis system for online troubleshooting in data centers. VScope provides primitive operations which data center operators can use to troubleshoot various performance issues. Each operation is essentially a series of monitoring and analysis functions executed on an overlay network. We design a novel software architecture for VScope so that the overlay networks can be generated, executed and terminated automatically, on-demand. From the troubleshooting side, we design novel anomaly detection algorithms and implement them in VScope. By running anomaly detection algorithms in VScope, data center operators are notified when performance anomalies happen. We also design a graph-based guidance approach, called VFocus, which tracks the interactions among hardware and software components in data centers. VFocus provides primitive operations by which operators can analyze the interactions to find out which components are relevant to the performance issue. VScope’s capabilities and performance are evaluated on a testbed with over 1000 virtual machines (VMs). Experimental results show that the VScope runtime negligibly perturbs system and application performance, and requires mere seconds to deploy monitoring and analytics functions on over 1000 nodes. This demonstrates VScope’s ability to support fast operation and online queries against a comprehensive set of application to system/platform level metrics, and a variety of representative analytics functions. When supporting algorithms with high computation complexity, VScope serves as a ‘thin layer’ that occupies no more than 5% of their total latency. Further, by using VFocus, VScope can locate problematic VMs that cannot be found via solely application-level monitoring, and in one of the use cases explored in the dissertation, it operates with levels of perturbation of over 400% less than what is seen for brute-force and most sampling-based approaches. We also validate VFocus with real-world data center traces. The experimental results show that VFocus has troubleshooting accuracy of 83% on average.Ph.D

    Mecanismos para controlo e gestão de redes 5G: redes de operador

    Get PDF
    In 5G networks, time-series data will be omnipresent for the monitoring of network metrics. With the increase in the number of Internet of Things (IoT) devices in the next years, it is expected that the number of real-time time-series data streams increases at a fast pace. To be able to monitor those streams, test and correlate different algorithms and metrics simultaneously and in a seamless way, time-series forecasting is becoming essential for the pro-active successful management of the network. The objective of this dissertation is to design, implement and test a prediction system in a communication network, that allows integrating various networks, such as a vehicular network and a 4G operator network, to improve the network reliability and Quality-of-Service (QoS). To do that, the dissertation has three main goals: (1) the analysis of different network datasets and implementation of different approaches to forecast network metrics, to test different techniques; (2) the design and implementation of a real-time distributed time-series forecasting architecture, to enable the network operator to make predictions about the network metrics; and lastly, (3) to use the forecasting models made previously and apply them to improve the network performance using resource management policies. The tests done with two different datasets, addressing the use cases of congestion management and resource splitting in a network with a limited number of resources, show that the network performance can be improved with proactive management made by a real-time system able to predict the network metrics and act on the network accordingly. It is also done a study about what network metrics can cause reduced accessibility in 4G networks, for the network operator to act more efficiently and pro-actively to avoid such eventsEm redes 5G, séries temporais serão omnipresentes para a monitorização de métricas de rede. Com o aumento do número de dispositivos da Internet das Coisas (IoT) nos próximos anos, é esperado que o número de fluxos de séries temporais em tempo real cresça a um ritmo elevado. Para monitorizar esses fluxos, testar e correlacionar diferentes algoritmos e métricas simultaneamente e de maneira integrada, a previsão de séries temporais está a tornar-se essencial para a gestão preventiva bem sucedida da rede. O objetivo desta dissertação é desenhar, implementar e testar um sistema de previsão numa rede de comunicações, que permite integrar várias redes diferentes, como por exemplo uma rede veicular e uma rede 4G de operador, para melhorar a fiabilidade e a qualidade de serviço (QoS). Para isso, a dissertação tem três objetivos principais: (1) a análise de diferentes datasets de rede e subsequente implementação de diferentes abordagens para previsão de métricas de rede, para testar diferentes técnicas; (2) o desenho e implementação de uma arquitetura distribuída de previsão de séries temporais em tempo real, para permitir ao operador de rede efetuar previsões sobre as métricas de rede; e finalmente, (3) o uso de modelos de previsão criados anteriormente e sua aplicação para melhorar o desempenho da rede utilizando políticas de gestão de recursos. Os testes efetuados com dois datasets diferentes, endereçando os casos de uso de gestão de congestionamento e divisão de recursos numa rede com recursos limitados, mostram que o desempenho da rede pode ser melhorado com gestão preventiva da rede efetuada por um sistema em tempo real capaz de prever métricas de rede e atuar em conformidade na rede. Também é efetuado um estudo sobre que métricas de rede podem causar reduzida acessibilidade em redes 4G, para o operador de rede atuar mais eficazmente e proativamente para evitar tais acontecimentos.Mestrado em Engenharia de Computadores e Telemátic

    A Business Intelligence Solution, based on a Big Data Architecture, for processing and analyzing the World Bank data

    Get PDF
    The rapid growth in data volume and complexity has needed the adoption of advanced technologies to extract valuable insights for decision-making. This project aims to address this need by developing a comprehensive framework that combines Big Data processing, analytics, and visualization techniques to enable effective analysis of World Bank data. The problem addressed in this study is the need for a scalable and efficient Business Intelligence solution that can handle the vast amounts of data generated by the World Bank. Therefore, a Big Data architecture is implemented on a real use case for the International Bank of Reconstruction and Development. The findings of this project demonstrate the effectiveness of the proposed solution. Through the integration of Apache Spark and Apache Hive, data is processed using Extract, Transform and Load techniques, allowing for efficient data preparation. The use of Apache Kylin enables the construction of a multidimensional model, facilitating fast and interactive queries on the data. Moreover, data visualization techniques are employed to create intuitive and informative visual representations of the analysed data. The key conclusions drawn from this project highlight the advantages of a Big Data-driven Business Intelligence solution in processing and analysing World Bank data. The implemented framework showcases improved scalability, performance, and flexibility compared to traditional approaches. In conclusion, this bachelor thesis presents a Business Intelligence solution based on a Big Data architecture for processing and analysing the World Bank data. The project findings emphasize the importance of scalable and efficient data processing techniques, multidimensional modelling, and data visualization for deriving valuable insights. The application of these techniques contributes to the field by demonstrating the potential of Big Data Business Intelligence solutions in addressing the challenges associated with large-scale data analysis

    Classification algorithms for Big Data with applications in the urban security domain

    Get PDF
    A classification algorithm is a versatile tool, that can serve as a predictor for the future or as an analytical tool to understand the past. Several obstacles prevent classification from scaling to a large Volume, Velocity, Variety or Value. The aim of this thesis is to scale distributed classification algorithms beyond current limits, assess the state-of-practice of Big Data machine learning frameworks and validate the effectiveness of a data science process in improving urban safety. We found in massive datasets with a number of large-domain categorical features a difficult challenge for existing classification algorithms. We propose associative classification as a possible answer, and develop several novel techniques to distribute the training of an associative classifier among parallel workers and improve the final quality of the model. The experiments, run on a real large-scale dataset with more than 4 billion records, confirmed the quality of the approach. To assess the state-of-practice of Big Data machine learning frameworks and streamline the process of integration and fine-tuning of the building blocks, we developed a generic, self-tuning tool to extract knowledge from network traffic measurements. The result is a system that offers human-readable models of the data with minimal user intervention, validated by experiments on large collections of real-world passive network measurements. A good portion of this dissertation is dedicated to the study of a data science process to improve urban safety. First, we shed some light on the feasibility of a system to monitor social messages from a city for emergency relief. We then propose a methodology to mine temporal patterns in social issues, like crimes. Finally, we propose a system to integrate the findings of Data Science on the citizenry’s perception of safety and communicate its results to decision makers in a timely manner. We applied and tested the system in a real Smart City scenario, set in Turin, Italy

    Continuous Monitoring and Automated Fault Detection and Diagnosis of Large Air-Handling Units

    Get PDF

    Continuous Monitoring and Automated Fault Detection and Diagnosis of Large Air-Handling Units

    Get PDF

    Design and Implementation of Information Warehouse for Manufacturing Facility Supporting Holistic Energy Management

    Get PDF
    Energy management is one of the most critical tasks, which needs to be performed in manufacturing facility, since manufacturing consumes 1/3 of world’s energy. Same time, manufacturing facilities are equipped with large amounts of field devices, which generate vast amounts of information every second. With such a huge amount of real-time data, which has a potential to provide insight information for energy management needs, capturing, storing and processing of it becomes a challenge. In this thesis an information warehouse system supporting holistic energy management is designed and implemented. The main goal is to provide a system, which can capture, store and provide information relevant for energy management purposes in manufacturing facility. The thesis consists of three main parts. In the first part current and most relevant for energy management concepts and technologies, including Big Data, NoSQL, Service Oriented Architecture and Complex Event Processing, are explored, analyzed and compared. In the second part an architectural design of information warehouse is presented. During this step a set of tools and technologies is selected for implementation. In a third part, an information warehouse system is implemented and tested in a manufacturing line test-bed. Implemented information warehouse is based on multi-layered architectural pattern, where layers are communicating with each other via services. The most important advantage of this modular architecture is an ability to use implemented solution in any manufacturing facility, as modules can be easily reconfigured in order to adjust to different context. The designed information warehouse system was tested for a manufacturing line located in premises of Tampere University of Technology. The results of this thesis demonstrate that the developed information warehouse system is capable of collection, processing and providing access to crucial for energy management information

    AI-native Interconnect Framework for Integration of Large Language Model Technologies in 6G Systems

    Full text link
    The evolution towards 6G architecture promises a transformative shift in communication networks, with artificial intelligence (AI) playing a pivotal role. This paper delves deep into the seamless integration of Large Language Models (LLMs) and Generalized Pretrained Transformers (GPT) within 6G systems. Their ability to grasp intent, strategize, and execute intricate commands will be pivotal in redefining network functionalities and interactions. Central to this is the AI Interconnect framework, intricately woven to facilitate AI-centric operations within the network. Building on the continuously evolving current state-of-the-art, we present a new architectural perspective for the upcoming generation of mobile networks. Here, LLMs and GPTs will collaboratively take center stage alongside traditional pre-generative AI and machine learning (ML) algorithms. This union promises a novel confluence of the old and new, melding tried-and-tested methods with transformative AI technologies. Along with providing a conceptual overview of this evolution, we delve into the nuances of practical applications arising from such an integration. Through this paper, we envisage a symbiotic integration where AI becomes the cornerstone of the next-generation communication paradigm, offering insights into the structural and functional facets of an AI-native 6G network

    Anomaly Detection in Time Series: Current Focus and Future Challenges

    Get PDF
    Anomaly detection in time series has become an increasingly vital task, with applications such as fraud detection and intrusion monitoring. Tackling this problem requires an array of approaches, including statistical analysis, machine learning, and deep learning. Various techniques have been proposed to cater to the complexity of this problem. However, there are still numerous challenges in the field concerning how best to process high-dimensional and complex data streams in real time. This chapter offers insight into the cutting-edge models for anomaly detection in time series. Several of the models are discussed and their advantages and disadvantages are explored. We also look at new areas of research that are being explored by researchers today as their current focuses and how those new models or techniques are being implemented in them as they try to solve unique problems posed by complex data, high-volume data streams, and a need for real-time processing. These research areas will provide concrete examples of the applications of discussed models. Lastly, we identify some of the current issues and suggest future directions for research concerning anomaly detection systems. We aim to provide readers with a comprehensive picture of what is already out there so they can better understand the space – preparing them for further development within this growing field
    • …
    corecore