657 research outputs found
Functional Bipartite Ranking: a Wavelet-Based Filtering Approach
It is the main goal of this article to address the bipartite ranking issue
from the perspective of functional data analysis (FDA). Given a training set of
independent realizations of a (possibly sampled) second-order random function
with a (locally) smooth autocorrelation structure and to which a binary label
is randomly assigned, the objective is to learn a scoring function s with
optimal ROC curve. Based on linear/nonlinear wavelet-based approximations, it
is shown how to select compact finite dimensional representations of the input
curves adaptively, in order to build accurate ranking rules, using recent
advances in the ranking problem for multivariate data with binary feedback.
Beyond theoretical considerations, the performance of the learning methods for
functional bipartite ranking proposed in this paper are illustrated by
numerical experiments
How to Evaluate the Quality of Unsupervised Anomaly Detection Algorithms?
When sufficient labeled data are available, classical criteria based on
Receiver Operating Characteristic (ROC) or Precision-Recall (PR) curves can be
used to compare the performance of un-supervised anomaly detection algorithms.
However , in many situations, few or no data are labeled. This calls for
alternative criteria one can compute on non-labeled data. In this paper, two
criteria that do not require labels are empirically shown to discriminate
accurately (w.r.t. ROC or PR based criteria) between algorithms. These criteria
are based on existing Excess-Mass (EM) and Mass-Volume (MV) curves, which
generally cannot be well estimated in large dimension. A methodology based on
feature sub-sampling and aggregating is also described and tested, extending
the use of these criteria to high-dimensional datasets and solving major
drawbacks inherent to standard EM and MV curves
Multi-criteria Anomaly Detection using Pareto Depth Analysis
We consider the problem of identifying patterns in a data set that exhibit
anomalous behavior, often referred to as anomaly detection. In most anomaly
detection algorithms, the dissimilarity between data samples is calculated by a
single criterion, such as Euclidean distance. However, in many cases there may
not exist a single dissimilarity measure that captures all possible anomalous
patterns. In such a case, multiple criteria can be defined, and one can test
for anomalies by scalarizing the multiple criteria using a linear combination
of them. If the importance of the different criteria are not known in advance,
the algorithm may need to be executed multiple times with different choices of
weights in the linear combination. In this paper, we introduce a novel
non-parametric multi-criteria anomaly detection method using Pareto depth
analysis (PDA). PDA uses the concept of Pareto optimality to detect anomalies
under multiple criteria without having to run an algorithm multiple times with
different choices of weights. The proposed PDA approach scales linearly in the
number of criteria and is provably better than linear combinations of the
criteria.Comment: Removed an unnecessary line from Algorithm
- …