4,279 research outputs found

    Functional Bipartite Ranking: a Wavelet-Based Filtering Approach

    Full text link
    It is the main goal of this article to address the bipartite ranking issue from the perspective of functional data analysis (FDA). Given a training set of independent realizations of a (possibly sampled) second-order random function with a (locally) smooth autocorrelation structure and to which a binary label is randomly assigned, the objective is to learn a scoring function s with optimal ROC curve. Based on linear/nonlinear wavelet-based approximations, it is shown how to select compact finite dimensional representations of the input curves adaptively, in order to build accurate ranking rules, using recent advances in the ranking problem for multivariate data with binary feedback. Beyond theoretical considerations, the performance of the learning methods for functional bipartite ranking proposed in this paper are illustrated by numerical experiments

    Mass Volume Curves and Anomaly Ranking

    Full text link
    This paper aims at formulating the issue of ranking multivariate unlabeled observations depending on their degree of abnormality as an unsupervised statistical learning task. In the 1-d situation, this problem is usually tackled by means of tail estimation techniques: univariate observations are viewed as all the more `abnormal' as they are located far in the tail(s) of the underlying probability distribution. It would be desirable as well to dispose of a scalar valued `scoring' function allowing for comparing the degree of abnormality of multivariate observations. Here we formulate the issue of scoring anomalies as a M-estimation problem by means of a novel functional performance criterion, referred to as the Mass Volume curve (MV curve in short), whose optimal elements are strictly increasing transforms of the density almost everywhere on the support of the density. We first study the statistical estimation of the MV curve of a given scoring function and we provide a strategy to build confidence regions using a smoothed bootstrap approach. Optimization of this functional criterion over the set of piecewise constant scoring functions is next tackled. This boils down to estimating a sequence of empirical minimum volume sets whose levels are chosen adaptively from the data, so as to adjust to the variations of the optimal MV curve, while controling the bias of its approximation by a stepwise curve. Generalization bounds are then established for the difference in sup norm between the MV curve of the empirical scoring function thus obtained and the optimal MV curve

    PAC-Bayesian High Dimensional Bipartite Ranking

    Get PDF
    This paper is devoted to the bipartite ranking problem, a classical statistical learning task, in a high dimensional setting. We propose a scoring and ranking strategy based on the PAC-Bayesian approach. We consider nonlinear additive scoring functions, and we derive non-asymptotic risk bounds under a sparsity assumption. In particular, oracle inequalities in probability holding under a margin condition assess the performance of our procedure, and prove its minimax optimality. An MCMC-flavored algorithm is proposed to implement our method, along with its behavior on synthetic and real-life datasets

    Reconstructing dynamical networks via feature ranking

    Full text link
    Empirical data on real complex systems are becoming increasingly available. Parallel to this is the need for new methods of reconstructing (inferring) the topology of networks from time-resolved observations of their node-dynamics. The methods based on physical insights often rely on strong assumptions about the properties and dynamics of the scrutinized network. Here, we use the insights from machine learning to design a new method of network reconstruction that essentially makes no such assumptions. Specifically, we interpret the available trajectories (data) as features, and use two independent feature ranking approaches -- Random forest and RReliefF -- to rank the importance of each node for predicting the value of each other node, which yields the reconstructed adjacency matrix. We show that our method is fairly robust to coupling strength, system size, trajectory length and noise. We also find that the reconstruction quality strongly depends on the dynamical regime
    • …
    corecore