2,799 research outputs found

    Mass Volume Curves and Anomaly Ranking

    Full text link
    This paper aims at formulating the issue of ranking multivariate unlabeled observations depending on their degree of abnormality as an unsupervised statistical learning task. In the 1-d situation, this problem is usually tackled by means of tail estimation techniques: univariate observations are viewed as all the more `abnormal' as they are located far in the tail(s) of the underlying probability distribution. It would be desirable as well to dispose of a scalar valued `scoring' function allowing for comparing the degree of abnormality of multivariate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem by means of a novel functional performance criterion, referred to as the Mass Volume curve (MV curve in short), whose optimal elements are strictly increasing transforms of the density almost everywhere on the support of the density. We first study the statistical estimation of the MV curve of a given scoring function and we provide a strategy to build confidence regions using a smoothed bootstrap approach. Optimization of this functional criterion over the set of piecewise constant scoring functions is next tackled. This boils down to estimating a sequence of empirical minimum volume sets whose levels are chosen adaptively from the data, so as to adjust to the variations of the optimal MV curve, while controling the bias of its approximation by a stepwise curve. Generalization bounds are then established for the difference in sup norm between the MV curve of the empirical scoring function thus obtained and the optimal MV curve

    Investigation of PEMFC fault diagnosis with consideration of sensor reliability

    Get PDF
    Despite the wide range of applications for the polymer electrolyte membrane fuel cell (PEMFC), its reliability and durability are still major barriers for further commercialisation. As a possible solution, PEMFC fault diagnosis has received much more atten-tion in the last few decades. Due to the difficulty of developing an accurate PEMFC model incorporating various failure mode ef-fects, data-driven approaches are widely used for diagnosis purposes. These methods depend largely on the quality of sensor measurements from the PEMFC. Therefore, it is necessary to investigate sensor reliability when performing PEMFC fault diagnosis. In this study, sensor reliability is investigated by proposing an identification technique to detect abnormal sensors during PEMFC operation. The identified abnormal sensors will be removed from the analysis in order to guarantee reliable diagnostic performance. Moreover, the effectiveness of the proposed technique is investigated using test data from a PEMFC system, where fuel cell flooding is observed. During the test, due to accumulation of liquid water inside the PEMFC, the humidity sensors will give misleading readings, and flooding cannot be identified correctly with inclusion of these humidity sensors in the analysis. With the proposed technique, the abnormal humidity measurements can be detected at an early stage. Results demonstrate that by re-moving the abnormal sensors, flooding can be identified with the remaining sensors, thus reliable health monitoring can be guaranteed during the PEMFC operation

    Featured Anomaly Detection Methods and Applications

    Get PDF
    Anomaly detection is a fundamental research topic that has been widely investigated. From critical industrial systems, e.g., network intrusion detection systems, to people’s daily activities, e.g., mobile fraud detection, anomaly detection has become the very first vital resort to protect and secure public and personal properties. Although anomaly detection methods have been under consistent development over the years, the explosive growth of data volume and the continued dramatic variation of data patterns pose great challenges on the anomaly detection systems and are fuelling the great demand of introducing more intelligent anomaly detection methods with distinct characteristics to cope with various needs. To this end, this thesis starts with presenting a thorough review of existing anomaly detection strategies and methods. The advantageous and disadvantageous of the strategies and methods are elaborated. Afterward, four distinctive anomaly detection methods, especially for time series, are proposed in this work aiming at resolving specific needs of anomaly detection under different scenarios, e.g., enhanced accuracy, interpretable results, and self-evolving models. Experiments are presented and analysed to offer a better understanding of the performance of the methods and their distinct features. To be more specific, the abstracts of the key contents in this thesis are listed as follows: 1) Support Vector Data Description (SVDD) is investigated as a primary method to fulfill accurate anomaly detection. The applicability of SVDD over noisy time series datasets is carefully examined and it is demonstrated that relaxing the decision boundary of SVDD always results in better accuracy in network time series anomaly detection. Theoretical analysis of the parameter utilised in the model is also presented to ensure the validity of the relaxation of the decision boundary. 2) To support a clear explanation of the detected time series anomalies, i.e., anomaly interpretation, the periodic pattern of time series data is considered as the contextual information to be integrated into SVDD for anomaly detection. The formulation of SVDD with contextual information maintains multiple discriminants which help in distinguishing the root causes of the anomalies. 3) In an attempt to further analyse a dataset for anomaly detection and interpretation, Convex Hull Data Description (CHDD) is developed for realising one-class classification together with data clustering. CHDD approximates the convex hull of a given dataset with the extreme points which constitute a dictionary of data representatives. According to the dictionary, CHDD is capable of representing and clustering all the normal data instances so that anomaly detection is realised with certain interpretation. 4) Besides better anomaly detection accuracy and interpretability, better solutions for anomaly detection over streaming data with evolving patterns are also researched. Under the framework of Reinforcement Learning (RL), a time series anomaly detector that is consistently trained to cope with the evolving patterns is designed. Due to the fact that the anomaly detector is trained with labeled time series, it avoids the cumbersome work of threshold setting and the uncertain definitions of anomalies in time series anomaly detection tasks

    A data-driven approach for Network Intrusion Detection and Monitoring based on Kernel Null Space

    Get PDF
    International audienceIn this study, we propose a new approach to determine intrusions of network in real-time based on statistical process control technique and kernel null space method. The training samples in a class are mapped to a single point using the Kernel Null Foley-Sammon Transform. The Novelty Score are computed from testing samples in order to determine the threshold for the real-time detection of anomaly. The efficiency of the proposed method is illustrated over the KDD99 data set. The experimental results show that our new method outperforms the OCSVM and the original Kernel Null Space method by 1.53% and 3.86% respectively in terms of accuracy

    Constrained manifold learning for the characterization of pathological deviations from normality

    Get PDF
    International audienceThis paper describes a technique to (1) learn the representation of a pathological motion pattern from a given population, and (2) compare individuals to this population. Our hypothesis is that this pattern can be modeled as a deviation from normal motion by means of non-linear embedding techniques. Each subject is represented by a 2D map of local motion abnormalities, obtained from a statistical atlas of myocardial motion built from a healthy population. The algorithm estimates a manifold from a set of patients with varying degrees of the same disease, and compares individuals to the training population using a mapping to the manifold and a distance to normality along the manifold. The approach extends recent manifold learning techniques by constraining the manifold to pass by a physiologically meaningful origin representing a normal motion pattern. Interpolation techniques using locally adjustable kernel improve the accuracy of the method. The technique is applied in the context of cardiac resynchronization therapy (CRT), focusing on a specific motion pattern of intra-ventricular dyssynchrony called septal flash (SF). We estimate the manifold from 50 CRT candidates with SF and test it on 37 CRT candidates and 21 healthy volunteers. Experiments highlight the relevance of nonlinear techniques to model a pathological pattern from the training set and compare new individuals to this pattern

    Model-Based Environmental Visual Perception for Humanoid Robots

    Get PDF
    The visual perception of a robot should answer two fundamental questions: What? and Where? In order to properly and efficiently reply to these questions, it is essential to establish a bidirectional coupling between the external stimuli and the internal representations. This coupling links the physical world with the inner abstraction models by sensor transformation, recognition, matching and optimization algorithms. The objective of this PhD is to establish this sensor-model coupling

    Alarm flood reduction using multiple data sources

    Get PDF
    The introduction of distributed control systems in the process industry has increased the number of alarms per operator exponentially. Modern plants present a high level of interconnectivity due to steam recirculation, heat integration and the complex control systems installed in the plant. When there is a disturbance in the plant it spreads through its material, energy and information connections affecting the process variables on the path. The alarms associated to these process variables are triggered. The alarm messages may overload the operator in the control room, who will not be able to properly investigate each one of these alarms. This undesired situation is called an “alarm flood”. In such situations the operator might not be able to keep the plant within safe operation. The aim of this thesis is to reduce alarm flood periods in process plants. Consequential alarms coming from the same process abnormality are isolated and a causal alarm suggestion is given. The causal alarm in an alarm flood is the alarm associated to the asset originating the disturbance that caused the flood. Multiple information sources are used: an alarm log containing all past alarms messages, process data and a topology model of the plant. The alarm flood reduction is achieved with a combination of alarm log analysis, process data root-cause analysis and connectivity analysis. The research findings are implemented in a software tool that guides the user through the different steps of the method. Finally the applicability of the method is proved with an industrial case study
    • …
    corecore