2,191 research outputs found
Conditional Anomaly Detection with Soft Harmonic Functions
International audienceIn this paper, we consider the problem of conditional anomaly detection that aims to identify data instances with an unusual response or a class label. We develop a new non-parametric approach for conditional anomaly detection based on the soft harmonic solution, with which we estimate the confidence of the label to detect anomalous mislabeling. We further regularize the solution to avoid the detection of isolated examples and examples on the boundary of the distribution support. We demonstrate the efficacy of the proposed method on several synthetic and UCI ML datasets in detecting unusual labels when compared to several baseline approaches. We also evaluate the performance of our method on a real-world electronic health record dataset where we seek to identify unusual patient-management decisions
Conditional Anomaly Detection Using Soft Harmonic Functions: An Application to Clinical Alerting
International audienceTimely detection of concerning events is an important problem in clinical practice. In this paper, we consider the problem of conditional anomaly detection that aims to identify data instances with an unusual response, such as the omission of an important lab test. We develop a new non-parametric approach for conditional anomaly detection based on the soft harmonic solution, with which we estimate the confidence of the label to detect anomalous mislabeling. We further regularize the solution to avoid the detection of isolated examples and examples on the boundary of the distribution support. We demonstrate the efficacy of the proposed method in detecting unusual labels on a real-world electronic health record dataset and compare it to several baseline approaches
Adaptive Graph-Based Algorithms for Conditional Anomaly Detection and Semi-Supervised Learning
We develop graph-based methods for conditional anomaly detection and semi-supervised learning based on label propagation on a data similarity graph. When data is abundant or arrive in a stream, the problems of computation and data storage arise for any graph-based method. We propose a fast approximate online algorithm that solves for the harmonic solution on an approximate graph. We show, both empirically and theoretically, that good behavior can be achieved by collapsing nearby points into a set of local representative points that minimize distortion. Moreover, we regularize the harmonic solution to achieve better stability properties. Anomaly detection techniques are used to identify anomalous (unusual) patterns in data. In clinical settings, these may concern identifications of unusual patient--state outcomes or unusual patient-management decisions. Therefore, we also present graph-based methods for detecting conditional anomalies and apply it to the identification of unusual clinical actions in hospitals. Our hypothesis is that patient-management actions that are unusual with respect to the past patients may be due to errors and that it is worthwhile to raise an alert if such a condition is encountered. Conditional anomaly detection extends standard unconditional anomaly framework but also faces new problems known as fringe and isolated points. We devise novel nonparametric graph-based methods to tackle these problems. Our methods rely on graph connectivity analysis and soft harmonic solution. Finally, we conduct an extensive human evaluation study of our conditional anomaly methods by 15 experts in critical care
Detecting Outliers in Data with Correlated Measures
Advances in sensor technology have enabled the collection of large-scale
datasets. Such datasets can be extremely noisy and often contain a significant
amount of outliers that result from sensor malfunction or human operation
faults. In order to utilize such data for real-world applications, it is
critical to detect outliers so that models built from these datasets will not
be skewed by outliers.
In this paper, we propose a new outlier detection method that utilizes the
correlations in the data (e.g., taxi trip distance vs. trip time). Different
from existing outlier detection methods, we build a robust regression model
that explicitly models the outliers and detects outliers simultaneously with
the model fitting.
We validate our approach on real-world datasets against methods specifically
designed for each dataset as well as the state of the art outlier detectors.
Our outlier detection method achieves better performances, demonstrating the
robustness and generality of our method. Last, we report interesting case
studies on some outliers that result from atypical events.Comment: 10 page
Multiscale Dictionary Learning for Estimating Conditional Distributions
Nonparametric estimation of the conditional distribution of a response given
high-dimensional features is a challenging problem. It is important to allow
not only the mean but also the variance and shape of the response density to
change flexibly with features, which are massive-dimensional. We propose a
multiscale dictionary learning model, which expresses the conditional response
density as a convex combination of dictionary densities, with the densities
used and their weights dependent on the path through a tree decomposition of
the feature space. A fast graph partitioning algorithm is applied to obtain the
tree decomposition, with Bayesian methods then used to adaptively prune and
average over different sub-trees in a soft probabilistic manner. The algorithm
scales efficiently to approximately one million features. State of the art
predictive performance is demonstrated for toy examples and two neuroscience
applications including up to a million features
- …