1,579 research outputs found

    A Review on Outlier/Anomaly Detection in Time Series Data

    Get PDF
    Recent advances in technology have brought major breakthroughs in data collection, enabling a large amount of data to be gathered over time and thus generating time series. Mining this data has become an important task for researchers and practitioners in the past few years, including the detection of outliers or anomalies that may represent errors or events of interest. This review aims to provide a structured and comprehensive state-of-the-art on outlier detection techniques in the context of time series. To this end, a taxonomy is presented based on the main aspects that characterize an outlier detection technique.KK/2019-00095 IT1244-19 TIN2016-78365-R PID2019-104966GB-I0

    Temporally adaptive monitoring procedures with applications in enterprise cyber-security

    Get PDF
    Due to the perpetual threat of cyber-attacks, enterprises must employ and develop new methods of detection as attack vectors evolve and advance. Enterprise computer networks produce a large volume and variety of data including univariate data streams, time series and network graph streams. Motivated by cyber-security, this thesis develops adaptive monitoring tools for univariate and network graph data streams, however, they are not limited to this domain. In all domains, real data streams present several challenges for monitoring including trend, periodicity and change points. Streams often also have high volume and frequency. To deal with the non-stationarity in the data, the methods applied must be adaptive. Adaptability in the proposed procedures throughout the thesis is introduced using forgetting factors, weighting the data accordingly to recency. Secondly, methods applied must be computationally fast with a small or fixed computation burden and fixed storage requirements for timely processing. Throughout this thesis, sequential or sliding window approaches are employed to achieve this. The first part of the thesis is centred around univariate monitoring procedures. A sequential adaptive parameter estimator is proposed using a Bayesian framework. This procedure is then extended for multiple change point detection, where, unlike existing change point procedures, the proposed method is capable of detecting abrupt changes in the presence of trend. We additionally present a time series model which combines short-term and long-term behaviours of a series for improved anomaly detection. Unlike existing methods which primarily focus on point anomalies detection (extreme outliers), our method is capable of also detecting contextual anomalies, when the data deviates from persistent patterns of the series such as seasonality. Finally, a novel multi-type relational clustering methodology is proposed. As multiple relations exist between the different entities within a network (computers, users and ports), multiple network graphs can be generated. We propose simultaneously clustering over all graphs to produce a single clustering for each entity using Non-Negative Matrix Tri-Factorisation. Through simplifications, the proposed procedure is fast and scalable for large network graphs. Additionally, this methodology is extended for graph streams. This thesis provides an assortment of tools for enterprise network monitoring with a focus on adaptability and scalability making them suitable for intrusion detection and situational awareness.Open Acces

    Spatiotemporal anomaly detection: streaming architecture and algorithms

    Get PDF
    Includes bibliographical references.2020 Summer.Anomaly detection is the science of identifying one or more rare or unexplainable samples or events in a dataset or data stream. The field of anomaly detection has been extensively studied by mathematicians, statisticians, economists, engineers, and computer scientists. One open research question remains the design of distributed cloud-based architectures and algorithms that can accurately identify anomalies in previously unseen, unlabeled streaming, multivariate spatiotemporal data. With streaming data, time is of the essence, and insights are perishable. Real-world streaming spatiotemporal data originate from many sources, including mobile phones, supervisory control and data acquisition enabled (SCADA) devices, the internet-of-things (IoT), distributed sensor networks, and social media. Baseline experiments are performed on four (4) non-streaming, static anomaly detection multivariate datasets using unsupervised offline traditional machine learning (TML), and unsupervised neural network techniques. Multiple architectures, including autoencoders, generative adversarial networks, convolutional networks, and recurrent networks, are adapted for experimentation. Extensive experimentation demonstrates that neural networks produce superior detection accuracy over TML techniques. These same neural network architectures can be extended to process unlabeled spatiotemporal streaming using online learning. Space and time relationships are further exploited to provide additional insights and increased anomaly detection accuracy. A novel domain-independent architecture and set of algorithms called the Spatiotemporal Anomaly Detection Environment (STADE) is formulated. STADE is based on federated learning architecture. STADE streaming algorithms are based on a geographically unique, persistently executing neural networks using online stochastic gradient descent (SGD). STADE is designed to be pluggable, meaning that alternative algorithms may be substituted or combined to form an ensemble. STADE incorporates a Stream Anomaly Detector (SAD) and a Federated Anomaly Detector (FAD). The SAD executes at multiple locations on streaming data, while the FAD executes at a single server and identifies global patterns and relationships among the site anomalies. Each STADE site streams anomaly scores to the centralized FAD server for further spatiotemporal dependency analysis and logging. The FAD is based on recent advances in DNN-based federated learning. A STADE testbed is implemented to facilitate globally distributed experimentation using low-cost, commercial cloud infrastructure provided by Microsoft™. STADE testbed sites are situated in the cloud within each continent: Africa, Asia, Australia, Europe, North America, and South America. Communication occurs over the commercial internet. Three STADE case studies are investigated. The first case study processes commercial air traffic flows, the second case study processes global earthquake measurements, and the third case study processes social media (i.e., Twitter™) feeds. These case studies confirm that STADE is a viable architecture for the near real-time identification of anomalies in streaming data originating from (possibly) computationally disadvantaged, geographically dispersed sites. Moreover, the addition of the FAD provides enhanced anomaly detection capability. Since STADE is domain-independent, these findings can be easily extended to additional application domains and use cases
    • …
    corecore