19 research outputs found

    Change Point Detection for Streaming Data Using Support Vector Methods

    Get PDF
    Sequential multiple change point detection concerns the identification of multiple points in time where the systematic behavior of a statistical process changes. A special case of this problem, called online anomaly detection, occurs when the goal is to detect the first change and then signal an alert to an analyst for further investigation. This dissertation concerns the use of methods based on kernel functions and support vectors to detect changes. A variety of support vector-based methods are considered, but the primary focus concerns Least Squares Support Vector Data Description (LS-SVDD). LS-SVDD constructs a hypersphere in a kernel space to bound a set of multivariate vectors using a closed-form solution. The mathematical tractability of the LS-SVDD facilitates closed-form updates for the LS-SVDD Lagrange multipliers. The update formulae concern either adding or removing a block of observations from an existing LS-SVDD description, respectively, and thus LS-SVDD can be constructed or updated sequentially which makes it attractive for online problems with sequential data streams. LS-SVDD is applied to a variety of scenarios including online anomaly detection and sequential multiple change point detection

    Featured Anomaly Detection Methods and Applications

    Get PDF
    Anomaly detection is a fundamental research topic that has been widely investigated. From critical industrial systems, e.g., network intrusion detection systems, to people鈥檚 daily activities, e.g., mobile fraud detection, anomaly detection has become the very first vital resort to protect and secure public and personal properties. Although anomaly detection methods have been under consistent development over the years, the explosive growth of data volume and the continued dramatic variation of data patterns pose great challenges on the anomaly detection systems and are fuelling the great demand of introducing more intelligent anomaly detection methods with distinct characteristics to cope with various needs. To this end, this thesis starts with presenting a thorough review of existing anomaly detection strategies and methods. The advantageous and disadvantageous of the strategies and methods are elaborated. Afterward, four distinctive anomaly detection methods, especially for time series, are proposed in this work aiming at resolving specific needs of anomaly detection under different scenarios, e.g., enhanced accuracy, interpretable results, and self-evolving models. Experiments are presented and analysed to offer a better understanding of the performance of the methods and their distinct features. To be more specific, the abstracts of the key contents in this thesis are listed as follows: 1) Support Vector Data Description (SVDD) is investigated as a primary method to fulfill accurate anomaly detection. The applicability of SVDD over noisy time series datasets is carefully examined and it is demonstrated that relaxing the decision boundary of SVDD always results in better accuracy in network time series anomaly detection. Theoretical analysis of the parameter utilised in the model is also presented to ensure the validity of the relaxation of the decision boundary. 2) To support a clear explanation of the detected time series anomalies, i.e., anomaly interpretation, the periodic pattern of time series data is considered as the contextual information to be integrated into SVDD for anomaly detection. The formulation of SVDD with contextual information maintains multiple discriminants which help in distinguishing the root causes of the anomalies. 3) In an attempt to further analyse a dataset for anomaly detection and interpretation, Convex Hull Data Description (CHDD) is developed for realising one-class classification together with data clustering. CHDD approximates the convex hull of a given dataset with the extreme points which constitute a dictionary of data representatives. According to the dictionary, CHDD is capable of representing and clustering all the normal data instances so that anomaly detection is realised with certain interpretation. 4) Besides better anomaly detection accuracy and interpretability, better solutions for anomaly detection over streaming data with evolving patterns are also researched. Under the framework of Reinforcement Learning (RL), a time series anomaly detector that is consistently trained to cope with the evolving patterns is designed. Due to the fact that the anomaly detector is trained with labeled time series, it avoids the cumbersome work of threshold setting and the uncertain definitions of anomalies in time series anomaly detection tasks

    Deep Anomaly Detection under Labeling Budget Constraints

    Full text link
    Selecting informative data points for expert feedback can significantly improve the performance of anomaly detection (AD) in various contexts, such as medical diagnostics or fraud detection. In this paper, we determine a set of theoretical conditions under which anomaly scores generalize from labeled queries to unlabeled data. Motivated by these results, we propose a data labeling strategy with optimal data coverage under labeling budget constraints. In addition, we propose a new learning framework for semi-supervised AD. Extensive experiments on image, tabular, and video data sets show that our approach results in state-of-the-art semi-supervised AD performance under labeling budget constraints.Comment: deep anomaly detection, active learning, semi-supervised learnin

    eXplainable and Reliable Against Adversarial Machine Learning in Data Analytics

    Get PDF
    Machine learning (ML) algorithms are nowadays widely adopted in different contexts to perform autonomous decisions and predictions. Due to the high volume of data shared in the recent years, ML algorithms are more accurate and reliable since training and testing phases are more precise. An important concept to analyze when defining ML algorithms concerns adversarial machine learning attacks. These attacks aim to create manipulated datasets to mislead ML algorithm decisions. In this work, we propose new approaches able to detect and mitigate malicious adversarial machine learning attacks against a ML system. In particular, we investigate the Carlini-Wagner (CW), the fast gradient sign method (FGSM) and the Jacobian based saliency map (JSMA) attacks. The aim of this work is to exploit detection algorithms as countermeasures to these attacks. Initially, we performed some tests by using canonical ML algorithms with a hyperparameters optimization to improve metrics. Then, we adopt original reliable AI algorithms, either based on eXplainable AI (Logic Learning Machine) or Support Vector Data Description (SVDD). The obtained results show how the classical algorithms may fail to identify an adversarial attack, while the reliable AI methodologies are more prone to correctly detect a possible adversarial machine learning attack. The evaluation of the proposed methodology was carried out in terms of good balance between FPR and FNR on real world application datasets: Domain Name System (DNS) tunneling, Vehicle Platooning and Remaining Useful Life (RUL). In addition, a statistical analysis was performed to improve the robustness of the trained models, including evaluating their performance in terms of runtime and memory consumption

    Advances in Streaming Novelty Detection

    Get PDF
    153 p.En primer lugar, en esta tesis se aborda un problema de confusi贸n entre t茅rminos y problemas en el cual el mismo t茅rmino es utilizado para referirse a diferentes problemas y, de manera similar, el mismo problema es llamado con diferentes t茅rminos indistintamente. Esto motiva una dificultad de avance en elcampo de conocimiento dado que es dif铆cil encontrar literatura relacionada y propicia la repetici贸n detrabajos. En la primera contribuci贸n se propone una asignaci贸n individual de t茅rminos a problemas y una formalizaci贸n de los escenarios de aprendizaje para tratar de estandarizar el campo. En segundo lugar, se aborda el problema de Streaming Novelty Detection. En este problema, partiendo de un conjunto de datos supervisado, se aprende un modelo. A continuaci贸n, el modelo recibe nuevas instancias no etiquetadas para predecir su clase de manera online o en stream. El modelo debe actualizarse para hacer frente al concept-drift. En este escenario de clasificaci贸n, se asume que puedensurgir nuevas clases de forma din谩mica. Por lo tanto, el modelo debe ser capaz de descubrir nuevas clases de manera autom谩tica y sin supervisi贸n. En este contexto, esta tesis propone 2 contribuciones. En primerlugar una soluci贸n basada en mixturas de Guassianas donde cada clase en modelada con una de lascomponentes de la mixtura. En segundo lugar, se propone el uso de redes neuronales, tales como las redes Autoencoder, y las redes Deep Support Vector Data Description para trabajar con serie stemporales

    Information Theory and Its Application in Machine Condition Monitoring

    Get PDF
    Condition monitoring of machinery is one of the most important aspects of many modern industries. With the rapid advancement of science and technology, machines are becoming increasingly complex. Moreover, an exponential increase of demand is leading an increasing requirement of machine output. As a result, in most modern industries, machines have to work for 24 hours a day. All these factors are leading to the deterioration of machine health in a higher rate than before. Breakdown of the key components of a machine such as bearing, gearbox or rollers can cause a catastrophic effect both in terms of financial and human costs. In this perspective, it is important not only to detect the fault at its earliest point of inception but necessary to design the overall monitoring process, such as fault classification, fault severity assessment and remaining useful life (RUL) prediction for better planning of the maintenance schedule. Information theory is one of the pioneer contributions of modern science that has evolved into various forms and algorithms over time. Due to its ability to address the non-linearity and non-stationarity of machine health deterioration, it has become a popular choice among researchers. Information theory is an effective technique for extracting features of machines under different health conditions. In this context, this book discusses the potential applications, research results and latest developments of information theory-based condition monitoring of machineries

    Advanced Fault Diagnosis and Health Monitoring Techniques for Complex Engineering Systems

    Get PDF
    Over the last few decades, the field of fault diagnostics and structural health management has been experiencing rapid developments. The reliability, availability, and safety of engineering systems can be significantly improved by implementing multifaceted strategies of in situ diagnostics and prognostics. With the development of intelligence algorithms, smart sensors, and advanced data collection and modeling techniques, this challenging research area has been receiving ever-increasing attention in both fundamental research and engineering applications. This has been strongly supported by the extensive applications ranging from aerospace, automotive, transport, manufacturing, and processing industries to defense and infrastructure industries

    The spatial ecology of an endemic desert shrub

    Get PDF
    Using spatial patterns to infer biotic and abiotic processes underlying plant population dynamics is an important technique in contemporary ecology, with particular utility when investigating and shrub population dynamics, for which experimental and observational methodologies are rarely feasible. Using a novel one-class classification technique, the locations of over 17,000 Spartocytisus supranubius individuals were mapped from aerial imagery generating a spatially extensive (162 ha), yet accurate, dataset. The recent rapid increase in studies using pattern-process inference has not been accompanied by a rigorous assessment of the behaviour of these techniques, nor an appraisal of their utility in addressing ecological research questions. The first part of the thesis addresses these concerns, investigating whether current methodologies are adequate to test hypotheses concerning spatial interactions. A literature review reveals a preponderance of studies of small, little-replicated plots. The results of the research raise concerns about the utility of spatial point pattern analyses as currently applied in the literature. To avoid inaccurate description of fine-scale spatial structures it is recommended that researchers increase plot replication. Furthermore, studies of spatial structure and population dynamics should account for spatial environmental gradients, whatever plot size is used. The second part of the thesis presents a rigorous investigation, incorporating a priori inference and the application of fine-scale spatial statistical and modelling techniques, of the biotic and abiotic mechanisms underlying the spatial structure and population dynamics of S. supranubius, a leguminous shrub species endemic to the Canary Islands. The spatial structure of S. supranubius populations is consistent with the operation of clonal reproduction and intra-specific competition. However, the results indicate that spatial environmental heterogeneity (from small to broad scales), in particular topography, can interact with biotic processes to generate quantitatively different S. Supranubius patterns in different locations. Future research into the spatial and temporal dynamics of interactions between abiotic and biotic processes is recommended

    The spatial ecology of an endemic desert shrub

    Get PDF
    Using spatial patterns to infer biotic and abiotic processes underlying plant population dynamics is an important technique in contemporary ecology, with particular utility when investigating and shrub population dynamics, for which experimental and observational methodologies are rarely feasible. Using a novel one-class classification technique, the locations of over 17,000 Spartocytisus supranubius individuals were mapped from aerial imagery generating a spatially extensive (162 ha), yet accurate, dataset. The recent rapid increase in studies using pattern-process inference has not been accompanied by a rigorous assessment of the behaviour of these techniques, nor an appraisal of their utility in addressing ecological research questions. The first part of the thesis addresses these concerns, investigating whether current methodologies are adequate to test hypotheses concerning spatial interactions. A literature review reveals a preponderance of studies of small, little-replicated plots. The results of the research raise concerns about the utility of spatial point pattern analyses as currently applied in the literature. To avoid inaccurate description of fine-scale spatial structures it is recommended that researchers increase plot replication. Furthermore, studies of spatial structure and population dynamics should account for spatial environmental gradients, whatever plot size is used. The second part of the thesis presents a rigorous investigation, incorporating a priori inference and the application of fine-scale spatial statistical and modelling techniques, of the biotic and abiotic mechanisms underlying the spatial structure and population dynamics of S. supranubius, a leguminous shrub species endemic to the Canary Islands. The spatial structure of S. supranubius populations is consistent with the operation of clonal reproduction and intra-specific competition. However, the results indicate that spatial environmental heterogeneity (from small to broad scales), in particular topography, can interact with biotic processes to generate quantitatively different S. Supranubius patterns in different locations. Future research into the spatial and temporal dynamics of interactions between abiotic and biotic processes is recommended
    corecore