577 research outputs found

    Artificial intelligence driven anomaly detection for big data systems

    Get PDF
    The main goal of this thesis is to contribute to the research on automated performance anomaly detection and interference prediction by implementing Artificial Intelligence (AI) solutions for complex distributed systems, especially for Big Data platforms within cloud computing environments. The late detection and manual resolutions of performance anomalies and system interference in Big Data systems may lead to performance violations and financial penalties. Motivated by this issue, we propose AI-based methodologies for anomaly detection and interference prediction tailored to Big Data and containerized batch platforms to better analyze system performance and effectively utilize computing resources within cloud environments. Therefore, new precise and efficient performance management methods are the key to handling performance anomalies and interference impacts to improve the efficiency of data center resources. The first part of this thesis contributes to performance anomaly detection for in-memory Big Data platforms. We examine the performance of Big Data platforms and justify our choice of selecting the in-memory Apache Spark platform. An artificial neural network-driven methodology is proposed to detect and classify performance anomalies for batch workloads based on the RDD characteristics and operating system monitoring metrics. Our method is evaluated against other popular machine learning algorithms (ML), as well as against four different monitoring datasets. The results prove that our proposed method outperforms other ML methods, typically achieving 98–99% F-scores. Moreover, we prove that a random start instant, a random duration, and overlapped anomalies do not significantly impact the performance of our proposed methodology. The second contribution addresses the challenge of anomaly identification within an in-memory streaming Big Data platform by investigating agile hybrid learning techniques. We develop TRACK (neural neTwoRk Anomaly deteCtion in sparK) and TRACK-Plus, two methods to efficiently train a class of machine learning models for performance anomaly detection using a fixed number of experiments. Our model revolves around using artificial neural networks with Bayesian Optimization (BO) to find the optimal training dataset size and configuration parameters to efficiently train the anomaly detection model to achieve high accuracy. The objective is to accelerate the search process for finding the size of the training dataset, optimizing neural network configurations, and improving the performance of anomaly classification. A validation based on several datasets from a real Apache Spark Streaming system is performed, demonstrating that the proposed methodology can efficiently identify performance anomalies, near-optimal configuration parameters, and a near-optimal training dataset size while reducing the number of experiments up to 75% compared with naïve anomaly detection training. The last contribution overcomes the challenges of predicting completion time of containerized batch jobs and proactively avoiding performance interference by introducing an automated prediction solution to estimate interference among colocated batch jobs within the same computing environment. An AI-driven model is implemented to predict the interference among batch jobs before it occurs within system. Our interference detection model can alleviate and estimate the task slowdown affected by the interference. This model assists the system operators in making an accurate decision to optimize job placement. Our model is agnostic to the business logic internal to each job. Instead, it is learned from system performance data by applying artificial neural networks to establish the completion time prediction of batch jobs within the cloud environments. We compare our model with three other baseline models (queueing-theoretic model, operational analysis, and an empirical method) on historical measurements of job completion time and CPU run-queue size (i.e., the number of active threads in the system). The proposed model captures multithreading, operating system scheduling, sleeping time, and job priorities. A validation based on 4500 experiments based on the DaCapo benchmarking suite was carried out, confirming the predictive efficiency and capabilities of the proposed model by achieving up to 10% MAPE compared with the other models.Open Acces

    Designing the next generation intelligent transportation sensor system using big data driven machine learning techniques

    Get PDF
    Accurate traffic data collection is essential for supporting advanced traffic management system operations. This study investigated a large-scale data-driven sequential traffic sensor health monitoring (TSHM) module that can be used to monitor sensor health conditions over large traffic networks. Our proposed module consists of three sequential steps for detecting different types of abnormal sensor issues. The first step detects sensors with abnormally high missing data rates, while the second step uses clustering anomaly detection to detect sensors reporting abnormal records. The final step introduces a novel Bayesian changepoint modeling technique to detect sensors reporting abnormal traffic data fluctuations by assuming a constant vehicle length distribution based on average effective vehicle length (AEVL). Our proposed method is then compared with two benchmark algorithms to show its efficacy. Results obtained by applying our method to the statewide traffic sensor data of Iowa show it can successfully detect different classes of sensor issues. This demonstrates that sequential TSHM modules can help transportation agencies determine traffic sensors’ exact problems, thereby enabling them to take the required corrective steps. The second research objective will focus on the traffic data imputation after we discard the anomaly/missing data collected from failure traffic sensors. Sufficient high-quality traffic data are a crucial component of various Intelligent Transportation System (ITS) applications and research related to congestion prediction, speed prediction, incident detection, and other traffic operation tasks. Nonetheless, missing traffic data are a common issue in sensor data which is inevitable due to several reasons, such as malfunctioning, poor maintenance or calibration, and intermittent communications. Such missing data issues often make data analysis and decision-making complicated and challenging. In this study, we have developed a generative adversarial network (GAN) based traffic sensor data imputation framework (TSDIGAN) to efficiently reconstruct the missing data by generating realistic synthetic data. In recent years, GANs have shown impressive success in image data generation. However, generating traffic data by taking advantage of GAN based modeling is a challenging task, since traffic data have strong time dependency. To address this problem, we propose a novel time-dependent encoding method called the Gramian Angular Summation Field (GASF) that converts the problem of traffic time-series data generation into that of image generation. We have evaluated and tested our proposed model using the benchmark dataset provided by Caltrans Performance Management Systems (PeMS). This study shows that the proposed model can significantly improve the traffic data imputation accuracy in terms of Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) compared to state-of-the-art models on the benchmark dataset. Further, the model achieves reasonably high accuracy in imputation tasks even under a very high missing data rate (\u3e50%), which shows the robustness and efficiency of the proposed model. Besides the loop and radar sensors, traffic cameras have shown great ability to provide insightful traffic information using the image and video processing techniques. Therefore, the third and final part of this work aimed to introduce an end to end real-time cloud-enabled traffic video analysis (IVA) framework to support the development of the future smart city. As Artificial intelligence (AI) growing rapidly, Computer vision (CV) techniques are expected to significantly improve the development of intelligent transportation systems (ITS), which are anticipated to be a key component of future Smart City (SC) frameworks. Powered by computer vision techniques, the converting of existing traffic cameras into connected ``smart sensors called intelligent video analysis (IVA) systems has shown the great capability of producing insightful data to support ITS applications. However, developing such IVA systems for large-scale, real-time application deserves further study, as the current research efforts are focused more on model effectiveness instead of model efficiency. Therefore, we have introduced a real-time, large-scale, cloud-enabled traffic video analysis framework using NVIDIA DeepStream, which is a streaming analysis toolkit for AI-based video and image analysis. In this study, we have evaluated the technical and economic feasibility of our proposed framework to help traffic agency to build IVA systems more efficiently. Our study shows that the daily operating cost for our proposed framework on Google Cloud Platform (GCP) is less than $0.14 per camera, and that, compared with manual inspections, our framework achieves an average vehicle-counting accuracy of 83.7% on sunny days

    Detecting Anomalies From Big Data System Logs

    Get PDF
    Nowadays, big data systems (e.g., Hadoop and Spark) are being widely adopted by many domains for offering effective data solutions, such as manufacturing, healthcare, education, and media. A common problem about big data systems is called anomaly, e.g., a status deviated from normal execution, which decreases the performance of computation or kills running programs. It is becoming a necessity to detect anomalies and analyze their causes. An effective and economical approach is to analyze system logs. Big data systems produce numerous unstructured logs that contain buried valuable information. However manually detecting anomalies from system logs is a tedious and daunting task. This dissertation proposes four approaches that can accurately and automatically analyze anomalies from big data system logs without extra monitoring overhead. Moreover, to detect abnormal tasks in Spark logs and analyze root causes, we design a utility to conduct fault injection and collect logs from multiple compute nodes. (1) Our first method is a statistical-based approach that can locate those abnormal tasks and calculate the weights of factors for analyzing the root causes. In the experiment, four potential root causes are considered, i.e., CPU, memory, network, and disk I/O. The experimental results show that the proposed approach is accurate in detecting abnormal tasks as well as finding the root causes. (2) To give a more reasonable probability result and avoid ad-hoc factor weights calculating, we propose a neural network approach to analyze root causes of abnormal tasks. We leverage General Regression Neural Network (GRNN) to identify root causes for abnormal tasks. The likelihood of reported root causes is presented to users according to the weighted factors by GRNN. (3) To further improve anomaly detection by avoiding feature extraction, we propose a novel approach by leveraging Convolutional Neural Networks (CNN). Our proposed model can automatically learn event relationships in system logs and detect anomaly with high accuracy. Our deep neural network consists of logkey2vec embeddings, three 1D convolutional layers, a dropout layer, and max pooling. According to our experiment, our CNN-based approach has better accuracy compared to other approaches using Long Short-Term Memory (LSTM) and Multilayer Perceptron (MLP) on detecting anomaly in Hadoop DistributedFile System (HDFS) logs. (4) To analyze system logs more accurately, we extend our CNN-based approach with two attention schemes to detect anomalies in system logs. The proposed two attention schemes focus on different features from CNN\u27s output. We evaluate our approaches with several benchmarks, and the attention-based CNN model shows the best performance among all state-of-the-art methods

    Anomaly detection on data streams from vehicular networks

    Get PDF
    As redes veiculares são compostas por nós com elevada mobilidade que apenas estão ativos quando o veículo se encontra em movimento, tornando a rede imprevisível e em constante mudança. Num cenário tão dinâmico, detetar anomalias na rede torna-se uma tarefa exigente, mas crucial. A Veniam opera uma rede veicular que garante conexão fiável através de redes heterogéneas como LTE, Wi-Fi e DSRC, conectando os veículos à Internet e a outros dispositivos espalhados pela cidade. Ao longo do tempo, os nós enviam dados para a Cloud tanto por tecnologias em tempo real como por tecnologias tolerantes a atraso, aumentando a dinâmica da rede. O objetivo desta dissertação é propor e implementar um método para detetar anomalias numa rede veicular real, através de uma análise online dos fluxos de dados enviados dos veículos para a Cloud. Os fluxos da rede foram explorados de forma a caracterizar os dados disponíveis e selecionar casos de uso. Os datasets escolhidos foram submetidos a diferentes técnicas de deteção de anomalias, como previsão de séries temporais e deteção de outliers baseados na densidade da vizinhança, seguido da análise dos trade-offs para selecionar os algoritmos que melhor se ajustam às características dos dados. A solução proposta engloba duas etapas: uma primeira fase de triagem seguida de uma classificação baseada no método dos vizinhos mais próximos. O sistema desenvolvido foi implementado no cluster distribuído da Veniam, que executa Apache Spark, permitindo uma solução rápida e escalável que classifica os dados assim que chegam à Cloud. A performance do método foi avaliada pela sua precisão, i.e., a percentagem de verdadeiras anomalias dentro das anomalias detetadas, quando foi submetido a datasets com anomalias artificiais provenientes de fontes de dados diferentes, recebidas tanto por tecnologias em tempo real como por tecnologias tolerantes a atraso.Vehicular networks are characterized by high mobility nodes that are only active when the vehicle is moving, thus making the network unpredictable and in constant change. In such a dynamic scenario, detecting anomalies in the network is a challenging but crucial task. Veniam operates a vehicular network that ensures reliable connectivity through heterogeneous networks such as LTE, Wi-Fi and DSRC, connecting the vehicles to the Internet and to other devices spread throughout the city. Over time, nodes send data to the cloud either by real time technologies or by delay tolerant ones, increasing the network's dynamics. The aim of this dissertation is to propose and implement a method for detecting anomalies in a real-world vehicular network through means of an online analysis of the data streams that come from the vehicles to the cloud. The network's streams were explored in order to characterize the available data and select target use cases. The chosen datasets were submitted to different anomaly detection techniques, such as time series forecasting and density-based outlier detection, followed by the trade-offs' analysis to select the algorithms that best modeled the data characteristics. The proposed solution comprises two stages: a lightweight screening step, followed by a Nearest Neighbor classification. The developed system was implemented on Veniam's distributed cluster running Apache Spark, allowing a fast and scalable solution that classifies the data as soon as it reaches the Cloud. The performance of the method was evaluated by its precision, i.e., the percentage of true anomalies within the detected outliers, when it was submitted to datasets presenting artificial anomalies from different data sources, received either by real-time or delay tolerant technologies

    A Survey on Big Data for Network Traffic Monitoring and Analysis

    Get PDF
    Network Traffic Monitoring and Analysis (NTMA) represents a key component for network management, especially to guarantee the correct operation of large-scale networks such as the Internet. As the complexity of Internet services and the volume of traffic continue to increase, it becomes difficult to design scalable NTMA applications. Applications such as traffic classification and policing require real-time and scalable approaches. Anomaly detection and security mechanisms require to quickly identify and react to unpredictable events while processing millions of heterogeneous events. At last, the system has to collect, store, and process massive sets of historical data for post-mortem analysis. Those are precisely the challenges faced by general big data approaches: Volume, Velocity, Variety, and Veracity. This survey brings together NTMA and big data. We catalog previous work on NTMA that adopt big data approaches to understand to what extent the potential of big data is being explored in NTMA. This survey mainly focuses on approaches and technologies to manage the big NTMA data, additionally briefly discussing big data analytics (e.g., machine learning) for the sake of NTMA. Finally, we provide guidelines for future work, discussing lessons learned, and research directions

    Big data analytics: a predictive analysis applied to cybersecurity in a financial organization

    Get PDF
    Project Work presented as partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Knowledge Management and Business IntelligenceWith the generalization of the internet access, cyber attacks have registered an alarming growth in frequency and severity of damages, along with the awareness of organizations with heavy investments in cybersecurity, such as in the financial sector. This work is focused on an organization’s financial service that operates on the international markets in the payment systems industry. The objective was to develop a predictive framework solution responsible for threat detection to support the security team to open investigations on intrusive server requests, over the exponentially growing log events collected by the SIEM from the Apache Web Servers for the financial service. A Big Data framework, using Hadoop and Spark, was developed to perform classification tasks over the financial service requests, using Neural Networks, Logistic Regression, SVM, and Random Forests algorithms, while handling the training of the imbalance dataset through BEV. The main conclusions over the analysis conducted, registered the best scoring performances for the Random Forests classifier using all the preprocessed features available. Using the all the available worker nodes with a balanced configuration of the Spark executors, the most performant elapsed times for loading and preprocessing of the data were achieved using the column-oriented ORC with native format, while the row-oriented CSV format performed the best for the training of the classifiers.Com a generalização do acesso à internet, os ciberataques registaram um crescimento alarmante em frequência e severidade de danos causados, a par da consciencialização das organizações, com elevados investimentos em cibersegurança, como no setor financeiro. Este trabalho focou-se no serviço financeiro de uma organização que opera nos mercados internacionais da indústria de sistemas de pagamento. O objetivo consistiu no desenvolvimento uma solução preditiva responsável pela detecção de ameaças, por forma a dar suporte à equipa de segurança na abertura de investigações sobre pedidos intrusivos no servidor, relativamente aos exponencialmente crescentes eventos de log coletados pelo SIEM, referentes aos Apache Web Servers, para o serviço financeiro. Uma solução de Big Data, usando Hadoop e Spark, foi desenvolvida com o objectivo de executar tarefas de classificação sobre os pedidos do serviço financeiros, usando os algoritmos Neural Networks, Logistic Regression, SVM e Random Forests, solucionando os problemas associados ao treino de um dataset desequilibrado através de BEV. As principais conclusões sobre as análises realizadas registaram os melhores resultados de classificação usando o algoritmo Random Forests com todas as variáveis pré-processadas disponíveis. Usando todos os nós do cluster e uma configuração balanceada dos executores do Spark, os melhores tempos para carregar e pré-processar os dados foram obtidos usando o formato colunar ORC nativo, enquanto o formato CSV, orientado a linhas, apresentou os melhores tempos para o treino dos classificadores

    An attribute oriented induction based methodology to aid in predictive maintenance: anomaly detection, root cause analysis and remaining useful life

    Get PDF
    Predictive Maintenance is the maintenance methodology that provides the best performance to industrial organisations in terms of time, equipment effectiveness and economic savings. Thanks to the recent advances in technology, capturing process data from machines and sensors attached to them is no longer a challenging task, and can be used to perform complex analyses to help with maintenance requirements. On the other hand, knowledge of domain experts can be combined with information extracted from the machines’ assets to provide a better understanding of the underlying phenomena. This thesis proposes a methodology to assess the different requirements in relation to Predictive Maintenance. These are (i) Anomaly Detection (AD), (ii) Root Cause Analysis (RCA) and (iii) estimation of Remaining Useful Life (RUL). Multiple machine learning techniques and algorithms can be found in the literature to carry out the calculation of these requirements. In this thesis, the Attribute Oriented Induction (AOI) algorithm has been adopted and adapted to the Predictive Maintenance methodology needs. AOI has the capability of performing RCA, but also possibility to be used as an AD system. With the purpose of performing Predictive Maintenance, a variant, Repetitive Weighted Attribute Oriented Induction (ReWAOI ), has been proposed. ReWAOI has the ability to combine information extracted from the machine with the knowledge of experts in the field to describe its behaviour, and derive the Predictive Maintenance requirements. Through the use of ReWAOI, one-dimensional quantification function from multidimensional data can be obtained. This function is correlated with the evolution of the machine’s wear over time, and thus, the estimation of AD and RUL has been accomplished. In addition, the ReWAOI helps in the description of failure root causes. The proposed contributions of the thesis have been validated in different scenarios, both emulated but also real industrial case studies.Enpresei errendimendu hoberena eskaintzen dien mantentze metodologia Mantentze Prediktiboa da, denbora, ekipamenduen eraginkortasun, eta ekonomia alorretan. Azken urteetan eman diren teknologia aurrerapenei esker, makina eta sensoreetatiko datuen eskuraketa jada ez da erronka, eta manentenimendurako errekerimenduak betetzen laguntzeko analisi konplexuak egiteko erabili daitezke. Bestalde, alorreko jakintsuen ezagutza makinetatik eskuratzen den informazioarekin bateratu daiteke, gertakarien gaineko ulermena hobea izan dadin. Tesi honetan metodologia berri bat proposatzen da, Mantentze Prediktiboarekin lotura duten errekerimenduak betearazten dituena. Ondorengoak dira: (i) Anomalien Detekzioa (AD), (ii) Erro-Kausaren Analisia (RCA), eta (iii) Gainontzeko Bizitza Erabilgarriaren (RUL) estimazioa. Errekerimendu hauen kalkulua burutzeko, ikasketa automatikoko hainbat algoritmo aurkitu daitezke literaturan. Tesi honetan Attribute Oriented Induction (AOI) algoritmoa erabili eta egokitu da Mantentze Prediktiboaren beharretara. AOI-k RCA estimatzeko ahalmena dauka, baina AD kalkulatzeko erabilia izan daiteke baita ere. Mantentze Prediktiboa aplikatzeko helburuarekin, AOI-rentzat aldaera bat proposatu da: Repetitive Weighted Attribute Oriented Induction (ReWAOI ). ReWAOI-k alorreko jakintsuen ezagutza eta makinetatik eskuratutako informazioa bateratzeko ahalmena dauka, makinen portaera deskribatu ahal izateko, eta horrela, Mantentze Prediktiboaren errekerimenduak betetzeko. ReWAOI-ren erabileraren ondorioz, dimentsio bakarreko kuantifikazio funtzioa eskuratu daiteke hainbat dimentsiotako datuetatik. Funztio hau denboran zehar makinak duen higadurarekin erlazionatuta dago, eta beraz, AD eta RUL-aren estimazioak burutu daitezke. Horretaz gain, ReWAOI-k hutsegiteen erro-kausaren deskribapenak eskaintzeko ahalmena dauka. Tesian proposatutako kontribuzioak hainbat erabilpen kasutan balioztatu dira, batzuk emulatuak, eta beste batzuk industria alorreko kasu errealak izanik.El Mantenimiento Predictivo es la metodología de mantenimiento que mejor rendimiento aporta a las organizaciones industriales en cuestiones de tiempo, eficiencia del equipamiento, y rendimiento económico. Gracias a los recientes avances en tecnología, la captura de datos de proceso de máquinas y sensores ya no es un reto, y puede utilizarse para realizar complejos análisis que ayuden con el cumplimiento de los requerimientos de mantenimiento. Por otro lado, el conocimiento de expertos de dominio puede ser combinado con la información extraída de las máquinas para otorgar una mejor comprensión de los fenómenos ocurridos. Esta tesis propone una metodología que cumple con diferentes requerimientos establecidos para el Mantenimiento Predictivo. Estos son (i) la Detección de Anomalías (AD), el Análisis de la Causa-Raíz (RCA) y (iii) la estimación de la Vida Útil Remanente. Pueden encontrarse múltiples técnicas y algoritmos de aprendizaje automático en la literatura para llevar a cabo el cálculo de estos requerimientos. En esta tesis, el algoritmo Attribute Oriented Induction (AOI) ha sido seleccionado y adaptado a las necesidades que establece el Mantenimiento Predictivo. AOI tiene la capacidad de estimar el RCA, pero puede usarse, también, para el cálculo de la AD. Con el propósito de aplicar Mantenimiento Predictivo, se ha propuesto una variante del algoritmo, denominada Repetitive Weighted Attribute Oriented Induction (ReWAOI ). ReWAOI tiene la capacidad de combinar información extraída de la máquina y conocimiento de expertos de área para describir su comportamiento, y así, poder cumplir con los requerimientos del Mantenimiento Predictivo. Mediante el uso de ReWAOI, se puede obtener una función de cuantificación unidimensional, a partir de datos multidimensionales. Esta función está correlacionada con la evolución de la máquina en el tiempo, y por lo tanto, la estimación de AD y RUL puede ser realizada. Además, ReWAOI facilita la descripción de las causas-raíz de los fallos producidos. Las contribuciones propuestas en esta tesis han sido validadas en distintos escenarios, tanto en casos de uso industriales emulados como reales
    • …
    corecore