29 research outputs found

    Changepoint detection for data intensive settings

    Get PDF
    Detecting a point in a data sequence where the behaviour alters abruptly, otherwise known as a changepoint, has been an active area of interest for decades. More recently, with the advent of the data intensive era, the need for automated and computationally efficient changepoint methods has grown. We here introduce several new techniques for doing this which address many of the issues inherent in detecting changes in a streaming setting. In short, these new methods, which may be viewed as non-trivial extensions of existing classical procedures, are intended to be as useful in as wide a set of situations as possible, while retaining important theoretical guarantees and ease of implementation. The first novel contribution concerns two methods for parallelising existing dynamic programming based approaches to changepoint detection in the single variate setting. We demonstrate that these methods can result in near quadratic computational gains, while retaining important theoretical guarantees. Our next area of focus is the multivariate setting. We introduce two new methods for data intensive scenarios with a fixed, but possibly large, number of dimensions. The first of these is an offline method which detects one change at a time using a new test statistic. We demonstrate that this test statistic has competitive power in a variety of possible settings for a given changepoint, while allowing the method to be versatile across a range of possible modelling assumptions. The other method we introduce for multivariate data is also suitable in the streaming setting. In addition, it is able to relax many standard modelling assumptions. We discuss the empirical properties of the procedure, especially insofar as they relate to a desired false alarm error rate

    Adaptive estimation and change detection of correlation and quantiles for evolving data streams

    Get PDF
    Streaming data processing is increasingly playing a central role in enterprise data architectures due to an abundance of available measurement data from a wide variety of sources and advances in data capture and infrastructure technology. Data streams arrive, with high frequency, as never-ending sequences of events, where the underlying data generating process always has the potential to evolve. Business operations often demand real-time processing of data streams for keeping models up-to-date and timely decision-making. For example in cybersecurity contexts, analysing streams of network data can aid the detection of potentially malicious behaviour. Many tools for statistical inference cannot meet the challenging demands of streaming data, where the computational cost of updates to models must be constant to ensure continuous processing as data scales. Moreover, these tools are often not capable of adapting to changes, or drift, in the data. Thus, new tools for modelling data streams with efficient data processing and model updating capabilities, referred to as streaming analytics, are required. Regular intervention for control parameter configuration is prohibitive to the truly continuous processing constraints of streaming data. There is a notable absence of such tools designed with both temporal-adaptivity to accommodate drift and the autonomy to not rely on control parameter tuning. Streaming analytics with these properties can be developed using an Adaptive Forgetting (AF) framework, with roots in adaptive filtering. The fundamental contributions of this thesis are to extend the streaming toolkit by using the AF framework to develop autonomous and temporally-adaptive streaming analytics. The first contribution uses the AF framework to demonstrate the development of a model, and validation procedure, for estimating time-varying parameters of bivariate data streams from cyber-physical systems. This is accompanied by a novel continuous monitoring change detection system that compares adaptive and non-adaptive estimates. The second contribution is the development of a streaming analytic for the correlation coefficient and an associated change detector to monitor changes to correlation structures across streams. This is demonstrated on cybersecurity network data. The third contribution is a procedure for estimating time-varying binomial data with thorough exploration of the nuanced behaviour of this estimator. The final contribution is a framework to enhance extant streaming quantile estimators with autonomous, temporally-adaptive properties. In addition, a novel streaming quantile procedure is developed and demonstrated, in an extensive simulation study, to show appealing performance.Open Acces

    Lightweight IPv6 network probing detection framework

    Get PDF

    Operating system auditing and monitoring

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Послідовний статистичний аналіз в методах виявлення розладки випадкових дискретних процесів

    Get PDF
    Обсяг роботи 59 сторінок, 5 ілюстрацій, 11 таблиць, 3 додатки, 12 джерел літератури. Об’єкт дослідження – дискретні випадкові процеси, в яких в деякий момент часу відбувається зміна імовірностних характеристик – розладка. Предмет дослідження – математичні моделі і алгоритми знаходження розладки. Методи дослідження: критерії перевірки простих гіпотез; для створення програмної реалізації фреймворку застосовано засіб автоматизації наукових досліджень Python; оцінка побудованого алгоритму виконана за допомогою методу Монте – Карло; поставлені експерименти на реальних та штучно згенерованих даних. Побудована дискретна модель і реалізовано параметричний алгоритм знаходження розладки. Проведена оцінка алгоритму на синтетичних и реальних даних з наявною розладкою в них і без розладки. Наукова новизна одержаних результатiв: модифіковано вибраний алгоритм та побудовано його для пошуку розладки в дискретних стохастичних процесах. Практичне застосування. Побудована модель, яка здатна виявити розладку в дискретних стохастичних процесах в таких областях як криптографія, аналіз аномалій у мережевому трафіку, а також в дискретних процесах, де спостерігаються цілочисельні поточні характеристики. Вона дозволяє проводити неперервний моніторинг процесу із інформуванням його стану.59 pages, 5 illustrations, 11 tables, 3 applications, 12 sources of literature. Object of study – discrete random processes in which at a certain moment of time there is a change in probabilistic characteristics – changepoint. Purpose of study – mathematical models and algorithms of changepoint detection. Methods: simple hypotheses testing criteria; for software implementation of model means of automation research Python was used; evaluation of the implemented algorithm was carried out using the Monte Carlo method; experiments were conducted on the evaluation of real and synthetically generated data. A discrete model is constructed and a parametric method for changepoint detection is realized. The evaluation of the algorithm on synthetic and real data with the existing changepoint and without changepoint was carried out. The scientific novelty of the obtained results: the chosen algorithm was modified and built to search for changepoint in discrete stochastic processes. Practical implementation. The model has been constructed, which can be used for changepoint detection in discrete stochastic processes for such fields of study as cryptography, analysis of anomalies in network traffic; also it can be applied for changepoint detection in processes with integer flow characteristics; it allows continuously monitor the process and gives the information about its condition on the fly.Объем работы 59 страниц, 5 иллюстраций, 11 таблиц, 3 приложения, 12 источников литературы. Объект исследования – дискретные случайные процессы, в которых в некоторый момент времени происходит изменение вероятностных характеристик – разладка. Предмет исследования – математические модели и алгоритмы поиска разладки. Методы исследования: критерии проверки простых гипотез; для создания программной реализации модели применено средство автоматизации научных исследований Python; оценка реализованого алгоритма осуществлялась с использованием метода Монте – Карло; проведены експерименты по оценке на реальных и искусственно сгенерированых данных. Построена дискретная модель и реализовано параметрический метод нахождения разладки. Проведена оценка алгоритма на синтетических и реальных данных с имеющейся разладкой в них и без разладки. Научная новизна полученных результатов: модифицирован выбранный алгоритм и построен для поиска разладки в дискретных стохастических процессах. Практическое применение. Построена модель, которая способна выявить разладку в дискретных стохастических процессах в таких областях как криптография, анализ сетевого трафика, а также в дискретных процессах, где наблюдаются целочисленные текущие характеристики. Она позволяет проводить непрерывный мониторинг процесса с информированием его состояния

    Microservices-Based Autonomous Anomaly Detection for Mobile Network Observability

    Get PDF
    In modern telecommunication networks, network observability entails the use of diverse data sources to understand the state and behavior of the network, and its ability to provide the required service and user experience. Because of the vast amounts of data collection and transmission involved in this process, the network's performance is negatively impacted, and it can become difficult for network operators to identify the occurrence of problematic behavior before it is too late. To enable a more efficient form of data collection and aid in diagnostic operations, this thesis aims to develop an autonomous anomaly detection system for time series data. The system is to be developed as a microservices-based solution, to be integrated with a software-defined networking controller platform developed at \textit{Ericsson}. This thesis describes the extensive experimentation process conducted during the development of this system, including various methods of data processing, time series clustering, and anomaly detection. The resulting system is a highly customizable and scalable product, supported by modern and reliable anomaly detection models. The system is capable of detecting several different kinds of anomalies in an arbitrary number of mobile network monitoring metrics and can be easily configured to fit the specific needs of each customer

    Detection of Anomalous Behavior of IoT/CPS Devices Using Their Power Signals

    Get PDF
    Embedded computing devices, in the Internet of Things (IoT) or Cyber-Physical Systems (CPS), are becoming pervasive in many domains around the world. Their wide deployment in simple applications (e.g., smart buildings, fleet management, and smart agriculture) or in more critical operations (e.g., industrial control, smart power grids, and self-driving cars) creates significant market potential ($ 4-11 trillion in annual revenue is expected by 2025). A main requirement for the success of such systems and applications is the capacity to ensure the performance of these devices. This task includes equipping them to be resilient against security threats and failures. Globally, several critical infrastructure applications have been the target of cyber attacks. These recent incidents, as well as the rich applicable literature, confirm that more research is needed to overcome such challenges. Consequently, the need for robust approaches that detect anomalous behaving devices in security and safety-critical applications has become paramount. Solving such a problem minimizes different kinds of losses (e.g., confidential data theft, financial loss, service access restriction, or even casualties). In light of the aforementioned motivation and discussion, this thesis focuses on the problem of detecting the anomalous behavior of IoT/CPS devices by considering their side-channel information. Solving such a problem is extremely important in maintaining the security and dependability of critical systems and applications. Although several side-channel based approaches are found in the literature, there are still important research gaps that need to be addressed. First, the intrusive nature of the monitoring in some of the proposed techniques results in resources overhead and requires instrumentation of the internal components of a device, which makes them impractical. It also raises a data integrity flag. Second, the lack of realistic experimental power consumption datasets that reflect the normal and anomalous behaviors of IoT and CPS devices has prevented fair and coherent comparisons with the state of the art in this domain. Finally, most of the research to date has concentrated on the accuracy of detection and not the novelty of detecting new anomalies. Such a direction relies on: (i) the availability of labeled datasets; (ii) the complexity of the extracted features; and (iii) the available compute resources. These assumptions and requirements are usually unrealistic and unrepresentative. This research aims to bridge these gaps as follows. First, this study extends the state of the art that adopts the idea of leveraging the power consumption of devices as a signal and the concept of decoupling the monitoring system and the devices to be monitored to detect and classify the "operational health'' of the devices. Second, this thesis provides and builds power consumption-based datasets that can be utilized by AI as well as security research communities to validate newly developed detection techniques. The collected datasets cover a wide range of anomalous device behavior due to the main aspects of device security (i.e., confidentiality, integrity, and availability) and partial system failures. The extensive experiments include: a wide spectrum of various emulated malware scenarios; five real malware applications taken from the well-known Drebin dataset; distributed denial of service attack (DDOS) where an IoT device is treated as: (1) a victim of a DDOS attack, and (2) the source of a DDOS attack; cryptomining malware where the resources of an IoT device are being hijacked to be used to advantage of the attacker’s wish and desire; and faulty CPU cores. This level of extensive validation has not yet been reported in any study in the literature. Third, this research presents a novel supervised technique to detect anomalous device behavior based on transforming the problem into an image classification problem. The main aim of this methodology is to improve the detection performance. In order to achieve the goals of this study, the methodology combines two powerful computer vision tools, namely Histograms of Oriented Gradients (HOG) and a Convolutional Neural Network (CNN). Such a detection technique is not only useful in this present case but can contribute to most time-series classification (TSC) problems. Finally, this thesis proposes a novel unsupervised detection technique that requires only the normal behavior of a device in the training phase. Therefore, this methodology aims at detecting new/unseen anomalous behavior. The methodology leverages the power consumption of a device and Restricted Boltzmann Machine (RBM) AutoEncoders (AE) to build a model that makes them more robust to the presence of security threats. The methodology makes use of stacked RBM AE and Principal Component Analysis (PCA) to extract feature vector based on AE's reconstruction errors. A One-Class Support Vector Machine (OC-SVM) classifier is then trained to perform the detection task. Across 18 different datasets, both of our proposed detection techniques demonstrated high detection performance with at least ~ 88% accuracy and 85% F-Score on average. The empirical results indicate the effectiveness of the proposed techniques and demonstrated improved detection performance gain of 9% - 17% over results reported in other methods

    Quality of service analysis of internet links with minimal information

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, julio de 201

    Anomaly Detection in Cloud-Native systems

    Get PDF
    In recent years, microservices have gained popularity due to their benefits such as increased maintainability and scalability of the system. The microservice architectural pattern was adopted for the development of a large scale system which is commonly deployed on public and private clouds, and therefore the aim is to ensure that it always maintains an optimal level of performance. Consequently, the system is monitored by collecting different metrics including performancerelated metrics. The first part of this thesis focuses on the creation of a dataset of realistic time series with anomalies at deterministic locations. This dataset addresses the lack of labeled data for training of supervised models and the absence of publicly available data, in fact the data are not usually shared due to privacy concerns. The second part consists of an empirical study on the detection of anomalies occurring in the different services that compose the system. Specifically, the aim is to understand if it is possible to predict the anomalies in order to perform actions before system failures or performance degradation. Consequently, eight different classification-based Machine Learning algorithms were compared by collecting accuracy, training time and testing time, to figure out which technique might be most suitable for reducing system overload. The results showed that there are strong correlations between metrics and that it is possible to predict the anomalies in the system with approximately 90% of accuracy. The most important outcome is that performance-related anomalies can be detected by monitoring a limited number of metrics collected at runtime with a short training time. Future work includes the adoption of prediction-based approaches and the development of some tools for the prediction of anomalies in cloud native environments

    Per-host DDoS mitigation by direct-control reinforcement learning

    Get PDF
    DDoS attacks plague the availability of online services today, yet like many cybersecurity problems are evolving and non-stationary. Normal and attack patterns shift as new protocols and applications are introduced, further compounded by burstiness and seasonal variation. Accordingly, it is difficult to apply machine learning-based techniques and defences in practice. Reinforcement learning (RL) may overcome this detection problem for DDoS attacks by managing and monitoring consequences; an agent’s role is to learn to optimise performance criteria (which are always available) in an online manner. We advance the state-of-the-art in RL-based DDoS mitigation by introducing two agent classes designed to act on a per-flow basis, in a protocol-agnostic manner for any network topology. This is supported by an in-depth investigation of feature suitability and empirical evaluation. Our results show the existence of flow features with high predictive power for different traffic classes, when used as a basis for feedback-loop-like control. We show that the new RL agent models can offer a significant increase in goodput of legitimate TCP traffic for many choices of host density
    corecore