Search CORE

8 research outputs found

Adaptive partitioning schemes for bipartite ranking

Author: A. Nobel
A. Tsybakov
B. Ripley
C. Burges
C. Ferri
E. Hüllermeier
F. Provost
G. Lugosi
J. Friedman
L. Devroye
Marine Depecker
Nicolas Vayatis
P. Flach
P. Massart
R. Serfling
S. Arlot
S. Boucheron
S. Clémençon
S. Clémençon
S. Clémençon
S. Mallat
Stéphan Clémençon
T. Hastie
T. Joachims
Y. Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

An Empirical Analysis on Point-wise Machine Learning Techniques using Regression Trees for Web-search Ranking

Author: Mohan Ananth
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2010
Field of study

Learning how to rank a set of objects relative to an user defined query has received much interest in the machine learning community during the past decade. In fact, there have been two recent competitions hosted by internationally prominent search companies to encourage research on ranking web site documents. Recent literature on learning to rank has focused on three approaches: point-wise, pair-wise, and list-wise. Many different kinds of classifiers, including boosted decision trees, neural networks, and SVMs have proven successful in the field. This thesis surveys traditional point-wise techniques that use regression trees for web-search ranking. The thesis contains empirical studies on Random Forests and Gradient Boosted Decision Trees, with novel augmentations to them on real world data sets. We also analyze how these point-wise techniques perform on new areas of research for web-search ranking: transfer learning and feature-cost aware models

Washington University St. Louis: Open Scholarship

Bipartite Ranking: a Risk-Theoretic Perspective

Author: Menon Aditya
Williamson Robert
Publication venue: 'MIT Press - Journals'
Publication date: 29/11/2018
Field of study

We present a systematic study of the bipartite ranking problem, with the aim of explicating its connections to the class-probability estimation problem. Our study focuses on the properties of the statistical risk for bipartite ranking with general losses, which is closely related to a generalised notion of the area under the ROC curve: we establish alternate representations of this risk, relate the Bayes-optimal risk to a class of probability divergences, and characterise the set of Bayes-optimal scorers for the risk. We further study properties of a generalised class of bipartite risks, based on the p-norm push of Rudin (2009). Our analysis is based on the rich framework of proper losses, which are the central tool in the study of class-probability estimation. We show how this analytic tool makes transparent the generalisations of several existing results, such as the equivalence of the minimisers for four seemingly disparate risks from bipartite ranking and class-probability estimation. A novel practical implication of our analysis is the design of new families of losses for scenarios where accuracy at the head of ranked list is paramount, with comparable empirical performance to the p-norm push

The Australian National University

Structured learning for information retrieval

Author: Petterson James
Publication venue
Publication date: 21/11/2018
Field of study

Information retrieval is the area of study concerned with the process of searching, recovering and interpreting information from large amounts of data. In this Thesis we show that many of the problems in information retrieval consist of structured learning, where the goal is to learn predictors of complex output structures, consisting of many inter-dependent variables. We then attack these problems using principled machine learning methods that are specifically suited for such scenarios. In the process of doing so, we develop new models, new model extensions and new algorithms that, when integrated with existing methodology, comprise a new set of tools for solving a variety of information retrieval problems. Firstly, we cover the multi-label classification problem, where we seek to predict a set of labels associated with a given object; the output in this case is structured, as the output variables are interdependent. Secondly, we focus on document ranking, where given a query and a set of documents associated with it we want to rank them according to their relevance with respect to the query; here, again, we have a structured output - a ranking of documents. Thirdly, we address topic models, where we are given a set of documents and attempt to find a compact representation of them, by learning latent topics and associating a topic distribution to each document; the output is again structured, consisting of word and topic distributions. For all the above problems, we obtain state-of-the-art solutions as attested by empirical performance in publicly available real-world datasets

The Australian National University

Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

Author
Publication venue: AUAI Press
Publication date: 01/09/2018
Field of study

UCL Discovery