9,400 research outputs found
Data Imputation through the Identification of Local Anomalies
We introduce a comprehensive and statistical framework in a model free
setting for a complete treatment of localized data corruptions due to severe
noise sources, e.g., an occluder in the case of a visual recording. Within this
framework, we propose i) a novel algorithm to efficiently separate, i.e.,
detect and localize, possible corruptions from a given suspicious data instance
and ii) a Maximum A Posteriori (MAP) estimator to impute the corrupted data. As
a generalization to Euclidean distance, we also propose a novel distance
measure, which is based on the ranked deviations among the data attributes and
empirically shown to be superior in separating the corruptions. Our algorithm
first splits the suspicious instance into parts through a binary partitioning
tree in the space of data attributes and iteratively tests those parts to
detect local anomalies using the nominal statistics extracted from an
uncorrupted (clean) reference data set. Once each part is labeled as anomalous
vs normal, the corresponding binary patterns over this tree that characterize
corruptions are identified and the affected attributes are imputed. Under a
certain conditional independency structure assumed for the binary patterns, we
analytically show that the false alarm rate of the introduced algorithm in
detecting the corruptions is independent of the data and can be directly set
without any parameter tuning. The proposed framework is tested over several
well-known machine learning data sets with synthetically generated corruptions;
and experimentally shown to produce remarkable improvements in terms of
classification purposes with strong corruption separation capabilities. Our
experiments also indicate that the proposed algorithms outperform the typical
approaches and are robust to varying training phase conditions
Multi-criteria Anomaly Detection using Pareto Depth Analysis
We consider the problem of identifying patterns in a data set that exhibit
anomalous behavior, often referred to as anomaly detection. In most anomaly
detection algorithms, the dissimilarity between data samples is calculated by a
single criterion, such as Euclidean distance. However, in many cases there may
not exist a single dissimilarity measure that captures all possible anomalous
patterns. In such a case, multiple criteria can be defined, and one can test
for anomalies by scalarizing the multiple criteria using a linear combination
of them. If the importance of the different criteria are not known in advance,
the algorithm may need to be executed multiple times with different choices of
weights in the linear combination. In this paper, we introduce a novel
non-parametric multi-criteria anomaly detection method using Pareto depth
analysis (PDA). PDA uses the concept of Pareto optimality to detect anomalies
under multiple criteria without having to run an algorithm multiple times with
different choices of weights. The proposed PDA approach scales linearly in the
number of criteria and is provably better than linear combinations of the
criteria.Comment: Removed an unnecessary line from Algorithm
- …