127 research outputs found

    Conflict-driven Hybrid Observer-based Anomaly Detection

    Full text link
    This paper presents an anomaly detection method using a hybrid observer -- which consists of a discrete state observer and a continuous state observer. We focus our attention on anomalies caused by intelligent attacks, which may bypass existing anomaly detection methods because neither the event sequence nor the observed residuals appear to be anomalous. Based on the relation between the continuous and discrete variables, we define three conflict types and give the conditions under which the detection of the anomalies is guaranteed. We call this method conflict-driven anomaly detection. The effectiveness of this method is demonstrated mathematically and illustrated on a Train-Gate (TG) system

    A Bi-Criteria Active Learning Algorithm for Dynamic Data Streams

    Get PDF
    Active learning (AL) is a promising way to efficiently building up training sets with minimal supervision. A learner deliberately queries specific instances to tune the classifier’s model using as few labels as possible. The challenge for streaming is that the data distribution may evolve over time and therefore the model must adapt. Another challenge is the sampling bias where the sampled training set does not reflect the underlying data distribution. In presence of concept drift, sampling bias is more likely to occur as the training set needs to represent the whole evolving data. To tackle these challenges, we propose a novel bi-criteria AL approach (BAL) that relies on two selection criteria, namely label uncertainty criterion and density-based cri- terion . While the first criterion selects instances that are the most uncertain in terms of class membership, the latter dynamically curbs the sampling bias by weighting the samples to reflect on the true underlying distribution. To design and implement these two criteria for learning from streams, BAL adopts a Bayesian online learning approach and combines online classification and online clustering through the use of online logistic regression and online growing Gaussian mixture models respectively. Empirical results obtained on standard synthetic and real-world benchmarks show the high performance of the proposed BAL method compared to the state-of-the-art AL method

    A non-parametric hierarchical clustering model

    Get PDF
    © 2015 IEEE. We present a novel non-parametric clustering model using Gaussian mixture model (NHCM). NHCM uses a novel Dirichlet process (DP) prior allowing for more flexible modeling of the data, where the base distribution of DP is itself an infinite mixture of Gaussian conjugate prior. NHCM can be thought of as hierarchical clustering model, in which the low level base prior governs the distribution of the data points forming sub-clusters, and the higher level prior governs the distribution of the sub-clusters forming clusters. Using this hierarchical configuration, we can maintain low complexity of the model and allow for clustering skewed complex data. To perform inference, we propose a Gibbs sampling algorithm. Empirical investigations have been carried out to analyse the efficiency of the proposed clustering model

    Active Learning for Data Streams under Concept Drift and concept evolution.

    Get PDF
    Data streams classification is an important problem however, poses many challenges. Since the length of the data is theoretically infinite, it is impractical to store and process all the historical data. Data streams also experience change of its underlying dis-tribution (concept drift), thus the classifier must adapt. Another challenge of data stream classification is the possible emergence and disappearance of classes which is known as (concept evolution) problem. On the top of these challenges, acquiring labels with such large data is expensive. In this paper, we propose a stream-based active learning (AL) strategy (SAL) that handles the aforementioned challenges. SAL aims at querying the labels of samples which results in optimizing the expected future error. It handles concept drift and concept evolution by adapting to the change in the stream. Furthermore, as a part of the error reduction process, SAL handles the sampling bias problem and queries the samples that caused the change i.e., drifted samples or samples coming from new classes. To tackle the lack of prior knowledge about the streaming data, non-parametric Bayesian modelling is adopted namely the two representations of Dirichlet process; Dirichlet mixture models and stick breaking process. Empirical results obtained on real-world benchmarks show the high performance of the proposed SAL method compared to the state-of-the-art methods

    Asynchronous Stochastic Variational Inference

    Full text link
    Stochastic variational inference (SVI) employs stochastic optimization to scale up Bayesian computation to massive data. Since SVI is at its core a stochastic gradient-based algorithm, horizontal parallelism can be harnessed to allow larger scale inference. We propose a lock-free parallel implementation for SVI which allows distributed computations over multiple slaves in an asynchronous style. We show that our implementation leads to linear speed-up while guaranteeing an asymptotic ergodic convergence rate O(1/(T)O(1/\sqrt(T) ) given that the number of slaves is bounded by (T)\sqrt(T) (TT is the total number of iterations). The implementation is done in a high-performance computing (HPC) environment using message passing interface (MPI) for python (MPI4py). The extensive empirical evaluation shows that our parallel SVI is lossless, performing comparably well to its counterpart serial SVI with linear speed-up.Comment: 7 pages, 8 figures, 1 table, 2 algorithms, The paper has been submitted for publicatio

    Distributed modeling approach of discrete manufacturing systems by Parts of Plant

    Get PDF
    International audienceThe paper presents an original approach to model a discrete manufacturing system by Parts of Plant (PoP). This approach takes into account technical and technological specifications of each plant elements. The aim of this works is to realize a reliable simulation of discrete manufacturing systems in design stage before production stage. Models are distributed and established from the functional chain of a process. They take into account the distribution of information through each PoP with its sensors, pre-actuators and actuators. A PoP library is proposed with their corresponding model. An application example is used to illustrate the approach

    Discrete Event Model-Based Approach for Fault Detection and Isolation of Manufacturing Systems

    Get PDF
    International audienceThis paper presents a discrete event model-based approach for Fault Detection and Isolation of manufacturing systems. This approach considers a system as a set of independent plant elements. Each plant element is composed of a set of interrelated Parts of Plant (PoPs) modeled by a Moore automaton. Each PoP model is only aware of its local behavior. The degraded and faulty behaviors are added to each PoP model in order to obtain extended PoP ones. An extrapolation of Gaussian learning is realized to obtain acceptable temporal intervals between the time occurrences of correlated events. Finally based on the PoP extended models and the links between them, a fault candidates' tree is established for each plant element. This candidates' tree corresponds to a local on-line fault event occurrence observer, called diagnoser. Thus, the diagnosis decision is distributed on each plant element. An application example is used to illustrate the approach

    Unconditional decentralized structure for the fault diagnosis of discrete event systems

    Get PDF
    International audienceThis paper proposes an unconditional decentralized structure to realize the fault diagnosis of Discrete Event Systems (DES), specially manufacturing systems with discrete sensors and actuators. This structure is composed on the use of a set of local diagnosers, each one of them is responsible of a specific part of the plant. These local diagnosers are based on a modular modelling of the plant in order to reduce the state explosion. Each local diagnoser uses event-based, state based and timed models to take a decision about fault's occurrences. These models are obtained using the information provided by the plant, the controller and the actuators reactivity. All local diagnosis decisions are then merged by a Boolean operator in order to obtain one global diagnosis decision. Finally, the diagnosers are polynomial-time in the cardinality of the state space of the system. This approach is illustrated using an example of manufacturing system

    Online Active Learning for Human Activity Recognition from Sensory Data Streams

    Get PDF
    Human activity recognition (HAR) is highly relevant to many real-world do- mains like safety, security, and in particular healthcare. The current machine learning technology of HAR is highly human-dependent which makes it costly and unreliable in non-stationary environment. Existing HAR algorithms assume that training data is collected and annotated by human a prior to the training phase. Furthermore, the data is assumed to exhibit the true characteristics of the underlying distribution. In this paper, we propose a new autonomous approach that consists of novel algorithms. In particular, we adopt active learning (AL) strategy to selectively query the user/resident about the label of particular activities in order to improve the model accuracy. This strategy helps overcome the challenge of labelling sequential data with time dependency which is highly time-consuming and difficult. Because of the changes that may affect the way activities are performed, we regard sensor data as a stream and human activity learning as an online continuous process. In such process the leaner can adapt to changes, incorporate novel activities and discard obsolete ones. To this extent, we propose a novel semi-supervised classifier (OSC) that works together with a novel Bayesian stream-based active learning (BSAL). Because of the changes in the sensor layouts across different houses' settings, we use Conditional Re-stricted Boltzmann Machine (CRBM) to handle the features engineering issue by learning the features regardless of the environment settings. CRBM is then applied to extract low-level features from unlabelled raw high-dimensional activity input. The resulting approach will then tackle the challenges of activity recognition using a three-module architecture composed of a feature extractor (CRBM), an online semi-supervised classifier (OSC) equipped with BSAL. CRBM-BSAL-OSC allows completely autonomous learning that adjusts to the environment setting, explores the changes and adapt to them. The paper provides the theoretical details of the proposed approach as well as an extensive empirical study to evaluate the performance of the approach. we propose a novel semi-supervised classifier (OSC) that works together with a novel Bayesian stream-based active learning (BSAL). Because of the changes in the sensor layouts across di erent houses' settings, we use Conditional Re

    Active Learning for Classifying Data Streams with Unknown Number of Classes.

    Get PDF
    The classification of data streams is an interesting but also a challenging problem. A data stream may grow infinitely making it impractical for storage prior to processing and classification. Due to its dynamic nature, the underlying distribution of the data stream may change over time resulting in the so-called concept drift or the possible emergence and fading of classes, known as concept evolution. In addition, acquiring labels of data samples in a stream is admittedly expensive if not infeasible at all. In this paper, we propose a novel stream-based active learning algorithm (SAL) which is capable of coping with both concept drift and concept evolution by adapting the classification model to the dynamic changes in the stream. SAL is the first AL algorithm in the literature to explicitly take account of these concepts. Moreover, using SAL, only labels of samples that are expected to reduce the expected future error are queried. This process is done while tackling the problem of sampling bias so that samples that induce the change (i.e., drifting samples or samples coming from new classes) are queried. To efficiently implement SAL, the paper proposes the application of non parametric Bayesian models allowing to cope with the lack of prior knowledge about the data stream. In particular, Dirichlet mixture models and the stick breaking process are adopted and adapted to meet the requirements of online learning. The empirical results obtained on real-world benchmarks demonstrate the superiority of SAL in terms of classification performance over the state-of-the-art methods using average and average class accuracy
    • …
    corecore