18 research outputs found
PAC-Bayesian High Dimensional Bipartite Ranking
This paper is devoted to the bipartite ranking problem, a classical
statistical learning task, in a high dimensional setting. We propose a scoring
and ranking strategy based on the PAC-Bayesian approach. We consider nonlinear
additive scoring functions, and we derive non-asymptotic risk bounds under a
sparsity assumption. In particular, oracle inequalities in probability holding
under a margin condition assess the performance of our procedure, and prove its
minimax optimality. An MCMC-flavored algorithm is proposed to implement our
method, along with its behavior on synthetic and real-life datasets
Improving prediction performance of stellar parameters using functional models
This paper investigates the problem of prediction of stellar parameters,
based on the star's electromagnetic spectrum. The knowledge of these parameters
permits to infer on the evolutionary state of the star. From a statistical
point of view, the spectra of different stars can be represented as functional
data. Therefore, a two-step procedure decomposing the spectra in a functional
basis combined with a regression method of prediction is proposed. We also use
a bootstrap methodology to build prediction intervals for the stellar
parameters. A practical application is also provided to illustrate the
numerical performance of our approach
Méthodes d'apprentissage statistique pour le ranking : théorie, algorithmes et applications
Multipartite ranking is a statistical learning problem that consists in ordering observations that belong to a high dimensional feature space in the same order as the labels, so that the observations with the highest label appear at the top of the list. This work aims to understand the probabilistic nature of the multipartite ranking problem in order to obtain theoretical guarantees for ranking algorithms. In this context, the output of a ranking algorithm takes the form of a scoring function, a function that maps the space of the observation to the real line which order is induced using the values on the real line. The contributions of this manuscript are the following : First, we focus on the characterization of optimal solutions to multipartite ranking. The second research theme is the design of algorithms to produce scoring functions. We offer two methods, the first using an aggregation procedure, the second an approximation scheme. Finally, we return to the binary ranking problem to establish adaptive minimax rate of convergence.Le ranking multipartite est un problĂšme d'apprentissage statistique qui consiste Ă ordonner les observations qui appartiennent Ă un espace de grande dimension dans le mĂȘme ordre que les labels, de sorte que les observations avec le label le plus Ă©levĂ© apparaissent en haut de la liste. Cette thĂšse vise Ă comprendre la nature probabiliste du problĂšme de ranking multipartite afin d'obtenir des garanties thĂ©oriques pour les algorithmes de ranking. Dans ce cadre, la sortie d'un algorithme de ranking prend la forme d'une fonction de scoring, une fonction qui envoie l'espace des observations sur la droite rĂ©elle et l'ordre finale est construit en utilisant l'ordre induit par la droite rĂ©elle. Les contributions de ce manuscrit sont les suivantes : d'abord, nous nous concentrons sur la caractĂ©risation des solutions optimales de ranking multipartite. Le deuxiĂšme thĂšme de recherche est la conception d'algorithmes pour produire des fonctions de scoring. Nous proposons deux mĂ©thodes, la premiĂšre utilisant une procĂ©dure d'agrĂ©gation, la deuxiĂšme un schema d'approximation. Enfin, nous revenons au problĂšme de ranking binaire afin d'Ă©tablir des vitesse minimax adaptives de convergences
Statistical learning methods for ranking : theory, algorithms and applications
Le ranking multipartite est un problĂšme d'apprentissage statistique qui consiste Ă ordonner les observations qui appartiennent Ă un espace de grande dimension dans le mĂȘme ordre que les labels, de sorte que les observations avec le label le plus Ă©levĂ© apparaissent en haut de la liste. Cette thĂšse vise Ă comprendre la nature probabiliste du problĂšme de ranking multipartite afin d'obtenir des garanties thĂ©oriques pour les algorithmes de ranking. Dans ce cadre, la sortie d'un algorithme de ranking prend la forme d'une fonction de scoring, une fonction qui envoie l'espace des observations sur la droite rĂ©elle et l'ordre finale est construit en utilisant l'ordre induit par la droite rĂ©elle. Les contributions de ce manuscrit sont les suivantes : d'abord, nous nous concentrons sur la caractĂ©risation des solutions optimales de ranking multipartite. Le deuxiĂšme thĂšme de recherche est la conception d'algorithmes pour produire des fonctions de scoring. Nous proposons deux mĂ©thodes, la premiĂšre utilisant une procĂ©dure d'agrĂ©gation, la deuxiĂšme un schema d'approximation. Enfin, nous revenons au problĂšme de ranking binaire afin d'Ă©tablir des vitesse minimax adaptives de convergences.Multipartite ranking is a statistical learning problem that consists in ordering observations that belong to a high dimensional feature space in the same order as the labels, so that the observations with the highest label appear at the top of the list. This work aims to understand the probabilistic nature of the multipartite ranking problem in order to obtain theoretical guarantees for ranking algorithms. In this context, the output of a ranking algorithm takes the form of a scoring function, a function that maps the space of the observation to the real line which order is induced using the values on the real line. The contributions of this manuscript are the following : First, we focus on the characterization of optimal solutions to multipartite ranking. The second research theme is the design of algorithms to produce scoring functions. We offer two methods, the first using an aggregation procedure, the second an approximation scheme. Finally, we return to the binary ranking problem to establish adaptive minimax rate of convergence
Anomaly Ranking as Supervised Bipartite Ranking
International audienceThe Mass Volume (MV) curve is a visual tool to evaluate the performance of a scoring function with regard to its capacity to rank data in the same order as the underlying density function. Anomaly ranking refers to the unsupervised learning task which consists in building a scoring function, based on unlabeled data, with a MV curve as low as possible at any point. In this paper, it is proved that, in the case where the data generating probability distribution has compact support, anomaly ranking is equivalent to (supervised) bipartite ranking, where the goal is to discriminate between the underlying probability distribution and the uniform distribution with same support. In this situation, the MV curve can be then seen as a simple transform of the corresponding ROC curve. Exploiting this view, we then show how to use bipartite ranking algorithms , possibly combined with random sampling , to solve the MV curve minimization problem. Numerical experiments based on a variety of bipartite ranking algorithms well-documented in the literature are displayed in order to illustrate the relevance of our approach
Minimax learning rates for bipartite ranking and plug-in rules
While it is now well-known in the standard binary classification setup, that, under suitable margin assumptions and complexity conditions on the regression function, fast or even super-fast rates (i.e. rates faster than n â1/2 or even faster than n â1) can be achieved by plug-in classifiers, no result of this nature has been proved yet in the context of bipartite ranking, though akin to that of classification. It is the main purpose of the present paper to investigate this issue, by considering bipartite ranking as a nested continuous collection of cost-sensitive classification problems. A global low noise condition is exhibited under which certain (plugin) ranking rules are proved to achieve fast (but not super-fast) rates over a wide nonparametric class of models. A lower bound result is also stated in a specific situation, establishing that such rates are optimal from a minimax perspective. 1
Building confidence regions for the ROC surface
International audienc
An Ensemble Learning Technique for Multipartite Ranking
International audienc
Ranking data with ordinal labels: optimality and pairwise aggregation
International audienc