4,279 research outputs found
Functional Bipartite Ranking: a Wavelet-Based Filtering Approach
It is the main goal of this article to address the bipartite ranking issue
from the perspective of functional data analysis (FDA). Given a training set of
independent realizations of a (possibly sampled) second-order random function
with a (locally) smooth autocorrelation structure and to which a binary label
is randomly assigned, the objective is to learn a scoring function s with
optimal ROC curve. Based on linear/nonlinear wavelet-based approximations, it
is shown how to select compact finite dimensional representations of the input
curves adaptively, in order to build accurate ranking rules, using recent
advances in the ranking problem for multivariate data with binary feedback.
Beyond theoretical considerations, the performance of the learning methods for
functional bipartite ranking proposed in this paper are illustrated by
numerical experiments
Mass Volume Curves and Anomaly Ranking
This paper aims at formulating the issue of ranking multivariate unlabeled
observations depending on their degree of abnormality as an unsupervised
statistical learning task. In the 1-d situation, this problem is usually
tackled by means of tail estimation techniques: univariate observations are
viewed as all the more `abnormal' as they are located far in the tail(s) of the
underlying probability distribution. It would be desirable as well to dispose
of a scalar valued `scoring' function allowing for comparing the degree of
abnormality of multivariate observations. Here we formulate the issue of
scoring anomalies as a M-estimation problem by means of a novel functional
performance criterion, referred to as the Mass Volume curve (MV curve in
short), whose optimal elements are strictly increasing transforms of the
density almost everywhere on the support of the density. We first study the
statistical estimation of the MV curve of a given scoring function and we
provide a strategy to build confidence regions using a smoothed bootstrap
approach. Optimization of this functional criterion over the set of piecewise
constant scoring functions is next tackled. This boils down to estimating a
sequence of empirical minimum volume sets whose levels are chosen adaptively
from the data, so as to adjust to the variations of the optimal MV curve, while
controling the bias of its approximation by a stepwise curve. Generalization
bounds are then established for the difference in sup norm between the MV curve
of the empirical scoring function thus obtained and the optimal MV curve
PAC-Bayesian High Dimensional Bipartite Ranking
This paper is devoted to the bipartite ranking problem, a classical
statistical learning task, in a high dimensional setting. We propose a scoring
and ranking strategy based on the PAC-Bayesian approach. We consider nonlinear
additive scoring functions, and we derive non-asymptotic risk bounds under a
sparsity assumption. In particular, oracle inequalities in probability holding
under a margin condition assess the performance of our procedure, and prove its
minimax optimality. An MCMC-flavored algorithm is proposed to implement our
method, along with its behavior on synthetic and real-life datasets
Reconstructing dynamical networks via feature ranking
Empirical data on real complex systems are becoming increasingly available.
Parallel to this is the need for new methods of reconstructing (inferring) the
topology of networks from time-resolved observations of their node-dynamics.
The methods based on physical insights often rely on strong assumptions about
the properties and dynamics of the scrutinized network. Here, we use the
insights from machine learning to design a new method of network reconstruction
that essentially makes no such assumptions. Specifically, we interpret the
available trajectories (data) as features, and use two independent feature
ranking approaches -- Random forest and RReliefF -- to rank the importance of
each node for predicting the value of each other node, which yields the
reconstructed adjacency matrix. We show that our method is fairly robust to
coupling strength, system size, trajectory length and noise. We also find that
the reconstruction quality strongly depends on the dynamical regime
- …