39,741 research outputs found
Multiple Instance Learning for Detecting Anomalies over Sequential Real-World Datasets
Detecting anomalies over real-world datasets remains a challenging task. Data
annotation is an intensive human labor problem, particularly in sequential
datasets, where the start and end time of anomalies are not known. As a result,
data collected from sequential real-world processes can be largely unlabeled or
contain inaccurate labels. These characteristics challenge the application of
anomaly detection techniques based on supervised learning. In contrast,
Multiple Instance Learning (MIL) has been shown effective on problems with
incomplete knowledge of labels in the training dataset, mainly due to the
notion of bags. While largely under-leveraged for anomaly detection, MIL
provides an appealing formulation for anomaly detection over real-world
datasets, and it is the primary contribution of this paper. In this paper, we
propose an MIL-based formulation and various algorithmic instantiations of this
framework based on different design decisions for key components of the
framework. We evaluate the resulting algorithms over four datasets that capture
different physical processes along different modalities. The experimental
evaluation draws out several observations. The MIL-based formulation performs
no worse than single instance learning on easy to moderate datasets and
outperforms single-instance learning on more challenging datasets. Altogether,
the results show that the framework generalizes well over diverse datasets
resulting from different real-world application domains.Comment: 9 pages,5 figures, Anomaly and Novelty Detection, Explanation and
Accommodation (ANDEA 2022
Graph machine learning for assembly modeling
Assembly modeling refers to the design engineering process of composing assemblies (e.g., machines or machine components) from a common catalog of existing parts. There is a natural correspondence of assemblies to graphs which can be exploited for services based on graph machine learning such as part recommendation, clustering/taxonomy creation, or anomaly detection. However, this domain imposes particular challenges such as the treatment of unknown or new parts, ambiguously extracted edges, incomplete information about the design sequence, interaction with design engineers as users, to name a few. Along with open research questions, we present a novel data set
Data Imputation through the Identification of Local Anomalies
We introduce a comprehensive and statistical framework in a model free
setting for a complete treatment of localized data corruptions due to severe
noise sources, e.g., an occluder in the case of a visual recording. Within this
framework, we propose i) a novel algorithm to efficiently separate, i.e.,
detect and localize, possible corruptions from a given suspicious data instance
and ii) a Maximum A Posteriori (MAP) estimator to impute the corrupted data. As
a generalization to Euclidean distance, we also propose a novel distance
measure, which is based on the ranked deviations among the data attributes and
empirically shown to be superior in separating the corruptions. Our algorithm
first splits the suspicious instance into parts through a binary partitioning
tree in the space of data attributes and iteratively tests those parts to
detect local anomalies using the nominal statistics extracted from an
uncorrupted (clean) reference data set. Once each part is labeled as anomalous
vs normal, the corresponding binary patterns over this tree that characterize
corruptions are identified and the affected attributes are imputed. Under a
certain conditional independency structure assumed for the binary patterns, we
analytically show that the false alarm rate of the introduced algorithm in
detecting the corruptions is independent of the data and can be directly set
without any parameter tuning. The proposed framework is tested over several
well-known machine learning data sets with synthetically generated corruptions;
and experimentally shown to produce remarkable improvements in terms of
classification purposes with strong corruption separation capabilities. Our
experiments also indicate that the proposed algorithms outperform the typical
approaches and are robust to varying training phase conditions
Practical Model-Based Diagnosis with Qualitative Possibilistic Uncertainty
An approach to fault isolation that exploits vastly incomplete models is
presented. It relies on separate descriptions of each component behavior,
together with the links between them, which enables focusing of the reasoning
to the relevant part of the system. As normal observations do not need
explanation, the behavior of the components is limited to anomaly propagation.
Diagnostic solutions are disorders (fault modes or abnormal signatures) that
are consistent with the observations, as well as abductive explanations. An
ordinal representation of uncertainty based on possibility theory provides a
simple exception-tolerant description of the component behaviors. We can for
instance distinguish between effects that are more or less certainly present
(or absent) and effects that are more or less certainly present (or absent)
when a given anomaly is present. A realistic example illustrates the benefits
of this approach.Comment: Appears in Proceedings of the Eleventh Conference on Uncertainty in
Artificial Intelligence (UAI1995
- …