127 research outputs found
Conflict-driven Hybrid Observer-based Anomaly Detection
This paper presents an anomaly detection method using a hybrid observer --
which consists of a discrete state observer and a continuous state observer. We
focus our attention on anomalies caused by intelligent attacks, which may
bypass existing anomaly detection methods because neither the event sequence
nor the observed residuals appear to be anomalous. Based on the relation
between the continuous and discrete variables, we define three conflict types
and give the conditions under which the detection of the anomalies is
guaranteed. We call this method conflict-driven anomaly detection. The
effectiveness of this method is demonstrated mathematically and illustrated on
a Train-Gate (TG) system
A Bi-Criteria Active Learning Algorithm for Dynamic Data Streams
Active learning (AL) is a promising way to efficiently
building up training sets with minimal supervision. A learner
deliberately queries specific instances to tune the classifier’s
model using as few labels as possible. The challenge for streaming
is that the data distribution may evolve over time and therefore
the model must adapt. Another challenge is the sampling bias
where the sampled training set does not reflect the underlying
data distribution. In presence of concept drift, sampling bias is
more likely to occur as the training set needs to represent the
whole evolving data. To tackle these challenges, we propose a
novel bi-criteria AL approach (BAL) that relies on two selection
criteria, namely
label uncertainty criterion
and
density-based cri-
terion
. While the first criterion selects instances that are the most
uncertain in terms of class membership, the latter dynamically
curbs the sampling bias by weighting the samples to reflect on the
true underlying distribution. To design and implement these two
criteria for learning from streams, BAL adopts a Bayesian online
learning approach and combines online classification and online
clustering through the use of
online logistic regression
and
online
growing Gaussian mixture models
respectively. Empirical results
obtained on standard synthetic and real-world benchmarks show
the high performance of the proposed BAL method compared to
the state-of-the-art AL method
A non-parametric hierarchical clustering model
© 2015 IEEE. We present a novel non-parametric clustering model using Gaussian mixture model (NHCM). NHCM uses a novel Dirichlet process (DP) prior allowing for more flexible modeling of the data, where the base distribution of DP is itself an infinite mixture of Gaussian conjugate prior. NHCM can be thought of as hierarchical clustering model, in which the low level base prior governs the distribution of the data points forming sub-clusters, and the higher level prior governs the distribution of the sub-clusters forming clusters. Using this hierarchical configuration, we can maintain low complexity of the model and allow for clustering skewed complex data. To perform inference, we propose a Gibbs sampling algorithm. Empirical investigations have been carried out to analyse the efficiency of the proposed clustering model
Active Learning for Data Streams under Concept Drift and concept evolution.
Data streams classification is an important problem however, poses many challenges. Since the length of the data is theoretically infinite, it is impractical to store and
process all the historical data. Data streams also experience change of its underlying dis-tribution (concept drift), thus the classifier must adapt. Another challenge of data stream classification is the possible emergence and disappearance of classes which is known as (concept evolution) problem. On the top of these challenges, acquiring labels with such large data is expensive. In this paper, we propose a stream-based active learning (AL) strategy (SAL) that handles the aforementioned challenges. SAL aims at querying the labels of samples which results in optimizing the expected future error. It handles
concept drift and concept evolution by adapting to the change in the stream. Furthermore, as a part of the error reduction process, SAL handles the sampling bias problem and queries the samples that caused the change i.e., drifted samples or samples coming from new classes. To tackle the lack of prior knowledge about the streaming data, non-parametric Bayesian modelling is adopted namely the two representations of Dirichlet process; Dirichlet mixture models and stick breaking process. Empirical results obtained on real-world benchmarks show the high performance of the proposed SAL method compared to the state-of-the-art methods
Asynchronous Stochastic Variational Inference
Stochastic variational inference (SVI) employs stochastic optimization to
scale up Bayesian computation to massive data. Since SVI is at its core a
stochastic gradient-based algorithm, horizontal parallelism can be harnessed to
allow larger scale inference. We propose a lock-free parallel implementation
for SVI which allows distributed computations over multiple slaves in an
asynchronous style. We show that our implementation leads to linear speed-up
while guaranteeing an asymptotic ergodic convergence rate )
given that the number of slaves is bounded by ( is the total
number of iterations). The implementation is done in a high-performance
computing (HPC) environment using message passing interface (MPI) for python
(MPI4py). The extensive empirical evaluation shows that our parallel SVI is
lossless, performing comparably well to its counterpart serial SVI with linear
speed-up.Comment: 7 pages, 8 figures, 1 table, 2 algorithms, The paper has been
submitted for publicatio
Distributed modeling approach of discrete manufacturing systems by Parts of Plant
International audienceThe paper presents an original approach to model a discrete manufacturing system by Parts of Plant (PoP). This approach takes into account technical and technological specifications of each plant elements. The aim of this works is to realize a reliable simulation of discrete manufacturing systems in design stage before production stage. Models are distributed and established from the functional chain of a process. They take into account the distribution of information through each PoP with its sensors, pre-actuators and actuators. A PoP library is proposed with their corresponding model. An application example is used to illustrate the approach
Discrete Event Model-Based Approach for Fault Detection and Isolation of Manufacturing Systems
International audienceThis paper presents a discrete event model-based approach for Fault Detection and Isolation of manufacturing systems. This approach considers a system as a set of independent plant elements. Each plant element is composed of a set of interrelated Parts of Plant (PoPs) modeled by a Moore automaton. Each PoP model is only aware of its local behavior. The degraded and faulty behaviors are added to each PoP model in order to obtain extended PoP ones. An extrapolation of Gaussian learning is realized to obtain acceptable temporal intervals between the time occurrences of correlated events. Finally based on the PoP extended models and the links between them, a fault candidates' tree is established for each plant element. This candidates' tree corresponds to a local on-line fault event occurrence observer, called diagnoser. Thus, the diagnosis decision is distributed on each plant element. An application example is used to illustrate the approach
Unconditional decentralized structure for the fault diagnosis of discrete event systems
International audienceThis paper proposes an unconditional decentralized structure to realize the fault diagnosis of Discrete Event Systems (DES), specially manufacturing systems with discrete sensors and actuators. This structure is composed on the use of a set of local diagnosers, each one of them is responsible of a specific part of the plant. These local diagnosers are based on a modular modelling of the plant in order to reduce the state explosion. Each local diagnoser uses event-based, state based and timed models to take a decision about fault's occurrences. These models are obtained using the information provided by the plant, the controller and the actuators reactivity. All local diagnosis decisions are then merged by a Boolean operator in order to obtain one global diagnosis decision. Finally, the diagnosers are polynomial-time in the cardinality of the state space of the system. This approach is illustrated using an example of manufacturing system
Online Active Learning for Human Activity Recognition from Sensory Data Streams
Human activity recognition (HAR) is highly relevant to many real-world do- mains like safety, security, and in particular healthcare. The current machine learning technology of HAR is highly human-dependent which makes it costly and unreliable in non-stationary environment. Existing HAR algorithms assume that training data is collected and annotated by human a prior to the training phase. Furthermore, the data is assumed to exhibit the true characteristics of the underlying distribution. In this paper, we propose a new autonomous approach that consists of novel algorithms. In particular, we adopt active learning (AL) strategy to selectively query the user/resident about the label of particular activities in order to improve the model accuracy. This strategy helps overcome the challenge of labelling sequential data with time dependency which is highly time-consuming and difficult. Because of the changes that may affect the way activities are performed, we regard sensor data as a stream and human activity learning as an online continuous process. In such process the leaner can adapt to changes, incorporate novel activities and discard obsolete ones. To this extent, we propose a novel semi-supervised classifier (OSC) that works together with a novel Bayesian stream-based active learning (BSAL). Because of the changes in the sensor layouts across different houses' settings, we use Conditional Re-stricted Boltzmann Machine (CRBM) to handle the features engineering issue by learning the features regardless of the environment settings. CRBM is then applied to extract low-level features from unlabelled raw high-dimensional activity input. The resulting approach will then tackle the challenges of activity recognition using a three-module architecture composed of a feature extractor (CRBM), an online semi-supervised classifier (OSC) equipped with BSAL. CRBM-BSAL-OSC allows completely autonomous learning that adjusts to the environment setting, explores the changes and adapt to them. The paper provides the theoretical details of the proposed approach as well as an extensive empirical study to evaluate the performance of the approach. we propose a novel semi-supervised classifier (OSC) that works together with a novel Bayesian stream-based active learning (BSAL). Because of the changes in the sensor layouts across di erent houses' settings, we use Conditional Re
Active Learning for Classifying Data Streams with Unknown Number of Classes.
The classification of data streams is an interesting but also a challenging problem. A data stream may grow infinitely making it impractical for storage prior to processing and classification. Due to its dynamic nature, the underlying distribution of the data stream may change over time resulting in the so-called concept drift or the possible emergence and fading of classes, known as concept evolution. In addition, acquiring labels of data samples in a stream is admittedly expensive if not infeasible at all. In this paper, we propose a novel stream-based active learning algorithm (SAL) which is capable of coping with both concept drift and concept evolution by adapting the classification model to the dynamic changes in the stream. SAL is the first AL algorithm in the literature to explicitly take account of these concepts. Moreover, using SAL, only labels of samples that are expected to reduce the expected future error are queried. This process is done while tackling the problem of sampling bias so that samples that induce the change (i.e., drifting samples or samples coming from new classes) are queried. To efficiently implement SAL, the paper proposes the application of non parametric Bayesian models allowing to cope with the lack of prior knowledge about the data stream. In particular, Dirichlet mixture models and the stick breaking process are adopted and adapted to meet the requirements of online learning. The empirical results obtained on real-world benchmarks demonstrate the superiority of SAL in terms of classification performance over the state-of-the-art methods using average and average class accuracy
- …