3,456 research outputs found

    Active Anomaly Detection in Heterogeneous Processes

    Full text link
    An active inference problem of detecting anomalies among heterogeneous processes is considered. At each time, a subset of processes can be probed. The objective is to design a sequential probing strategy that dynamically determines which processes to observe at each time and when to terminate the search so that the expected detection time is minimized under a constraint on the probability of misclassifying any process. This problem falls into the general setting of sequential design of experiments pioneered by Chernoff in 1959, in which a randomized strategy, referred to as the Chernoff test, was proposed and shown to be asymptotically optimal as the error probability approaches zero. For the problem considered in this paper, a low-complexity deterministic test is shown to enjoy the same asymptotic optimality while offering significantly better performance in the finite regime and faster convergence to the optimal rate function, especially when the number of processes is large. The computational complexity of the proposed test is also of a significantly lower order.Comment: This work has been accepted for publication on IEEE Transactions on Information Theor

    Data Imputation through the Identification of Local Anomalies

    Get PDF
    We introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose i) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and ii) a Maximum A Posteriori (MAP) estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics extracted from an uncorrupted (clean) reference data set. Once each part is labeled as anomalous vs normal, the corresponding binary patterns over this tree that characterize corruptions are identified and the affected attributes are imputed. Under a certain conditional independency structure assumed for the binary patterns, we analytically show that the false alarm rate of the introduced algorithm in detecting the corruptions is independent of the data and can be directly set without any parameter tuning. The proposed framework is tested over several well-known machine learning data sets with synthetically generated corruptions; and experimentally shown to produce remarkable improvements in terms of classification purposes with strong corruption separation capabilities. Our experiments also indicate that the proposed algorithms outperform the typical approaches and are robust to varying training phase conditions
    corecore