39,730 research outputs found

    Multiple Instance Learning for Detecting Anomalies over Sequential Real-World Datasets

    Full text link
    Detecting anomalies over real-world datasets remains a challenging task. Data annotation is an intensive human labor problem, particularly in sequential datasets, where the start and end time of anomalies are not known. As a result, data collected from sequential real-world processes can be largely unlabeled or contain inaccurate labels. These characteristics challenge the application of anomaly detection techniques based on supervised learning. In contrast, Multiple Instance Learning (MIL) has been shown effective on problems with incomplete knowledge of labels in the training dataset, mainly due to the notion of bags. While largely under-leveraged for anomaly detection, MIL provides an appealing formulation for anomaly detection over real-world datasets, and it is the primary contribution of this paper. In this paper, we propose an MIL-based formulation and various algorithmic instantiations of this framework based on different design decisions for key components of the framework. We evaluate the resulting algorithms over four datasets that capture different physical processes along different modalities. The experimental evaluation draws out several observations. The MIL-based formulation performs no worse than single instance learning on easy to moderate datasets and outperforms single-instance learning on more challenging datasets. Altogether, the results show that the framework generalizes well over diverse datasets resulting from different real-world application domains.Comment: 9 pages,5 figures, Anomaly and Novelty Detection, Explanation and Accommodation (ANDEA 2022

    Graph machine learning for assembly modeling

    Get PDF
    Assembly modeling refers to the design engineering process of composing assemblies (e.g., machines or machine components) from a common catalog of existing parts. There is a natural correspondence of assemblies to graphs which can be exploited for services based on graph machine learning such as part recommendation, clustering/taxonomy creation, or anomaly detection. However, this domain imposes particular challenges such as the treatment of unknown or new parts, ambiguously extracted edges, incomplete information about the design sequence, interaction with design engineers as users, to name a few. Along with open research questions, we present a novel data set

    Data Imputation through the Identification of Local Anomalies

    Get PDF
    We introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose i) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and ii) a Maximum A Posteriori (MAP) estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics extracted from an uncorrupted (clean) reference data set. Once each part is labeled as anomalous vs normal, the corresponding binary patterns over this tree that characterize corruptions are identified and the affected attributes are imputed. Under a certain conditional independency structure assumed for the binary patterns, we analytically show that the false alarm rate of the introduced algorithm in detecting the corruptions is independent of the data and can be directly set without any parameter tuning. The proposed framework is tested over several well-known machine learning data sets with synthetically generated corruptions; and experimentally shown to produce remarkable improvements in terms of classification purposes with strong corruption separation capabilities. Our experiments also indicate that the proposed algorithms outperform the typical approaches and are robust to varying training phase conditions

    Practical Model-Based Diagnosis with Qualitative Possibilistic Uncertainty

    Full text link
    An approach to fault isolation that exploits vastly incomplete models is presented. It relies on separate descriptions of each component behavior, together with the links between them, which enables focusing of the reasoning to the relevant part of the system. As normal observations do not need explanation, the behavior of the components is limited to anomaly propagation. Diagnostic solutions are disorders (fault modes or abnormal signatures) that are consistent with the observations, as well as abductive explanations. An ordinal representation of uncertainty based on possibility theory provides a simple exception-tolerant description of the component behaviors. We can for instance distinguish between effects that are more or less certainly present (or absent) and effects that are more or less certainly present (or absent) when a given anomaly is present. A realistic example illustrates the benefits of this approach.Comment: Appears in Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence (UAI1995
    corecore