391 research outputs found
Preference Learning
This report documents the program and the outcomes of Dagstuhl Seminar 14101 “Preference Learning”. Preferences have recently received considerable attention in disciplines such as machine learning, knowledge discovery, information retrieval, statistics, social choice theory, multiple criteria decision making, decision under risk and uncertainty, operations research, and others. The motivation for this seminar was to showcase recent progress in these different areas with the goal of working towards a common basis of understanding, which should help to facilitate future synergies
Multi-Target Prediction: A Unifying View on Problems and Methods
Multi-target prediction (MTP) is concerned with the simultaneous prediction
of multiple target variables of diverse type. Due to its enormous application
potential, it has developed into an active and rapidly expanding research field
that combines several subfields of machine learning, including multivariate
regression, multi-label classification, multi-task learning, dyadic prediction,
zero-shot learning, network inference, and matrix completion. In this paper, we
present a unifying view on MTP problems and methods. First, we formally discuss
commonalities and differences between existing MTP problems. To this end, we
introduce a general framework that covers the above subfields as special cases.
As a second contribution, we provide a structured overview of MTP methods. This
is accomplished by identifying a number of key properties, which distinguish
such methods and determine their suitability for different types of problems.
Finally, we also discuss a few challenges for future research
Learning from eXtreme Bandit Feedback
We study the problem of batch learning from bandit feedback in the setting of
extremely large action spaces. Learning from extreme bandit feedback is
ubiquitous in recommendation systems, in which billions of decisions are made
over sets consisting of millions of choices in a single day, yielding massive
observational data. In these large-scale real-world applications, supervised
learning frameworks such as eXtreme Multi-label Classification (XMC) are widely
used despite the fact that they incur significant biases due to the mismatch
between bandit feedback and supervised labels. Such biases can be mitigated by
importance sampling techniques, but these techniques suffer from impractical
variance when dealing with a large number of actions. In this paper, we
introduce a selective importance sampling estimator (sIS) that operates in a
significantly more favorable bias-variance regime. The sIS estimator is
obtained by performing importance sampling on the conditional expectation of
the reward with respect to a small subset of actions for each instance (a form
of Rao-Blackwellization). We employ this estimator in a novel algorithmic
procedure -- named Policy Optimization for eXtreme Models (POXM) -- for
learning from bandit feedback on XMC tasks. In POXM, the selected actions for
the sIS estimator are the top-p actions of the logging policy, where p is
adjusted from the data and is significantly smaller than the size of the action
space. We use a supervised-to-bandit conversion on three XMC datasets to
benchmark our POXM method against three competing methods: BanditNet, a
previously applied partial matching pruning strategy, and a supervised learning
baseline. Whereas BanditNet sometimes improves marginally over the logging
policy, our experiments show that POXM systematically and significantly
improves over all baselines
Incorporating label dependencies in multilabel stance detection
© 2019 Association for Computational Linguistics Stance detection in social media is a well-studied task in a variety of domains. Nevertheless, previous work has mostly focused on multiclass versions of the problem, where the labels are mutually exclusive, and typically positive, negative or neutral. In this paper, we address versions of the task in which an utterance can have multiple labels, thus corresponding to multilabel classification. We propose a method that explicitly incorporates label dependencies in the training objective and compare it against a variety of baselines, as well as a reduction of multilabel to multiclass learning. In experiments with three datasets, we find that our proposed method improves upon all baselines on two out of three datasets. We also show that the reduction of multilabel to multiclass classification can be very competitive, especially in cases where the output consists of a small number of labels and one can enumerate over all label combinations
Counterfactual Risk Minimization: Learning from Logged Bandit Feedback
We develop a learning principle and an efficient algorithm for batch learning
from logged bandit feedback. This learning setting is ubiquitous in online
systems (e.g., ad placement, web search, recommendation), where an algorithm
makes a prediction (e.g., ad ranking) for a given input (e.g., query) and
observes bandit feedback (e.g., user clicks on presented ads). We first address
the counterfactual nature of the learning problem through propensity scoring.
Next, we prove generalization error bounds that account for the variance of the
propensity-weighted empirical risk estimator. These constructive bounds give
rise to the Counterfactual Risk Minimization (CRM) principle. We show how CRM
can be used to derive a new learning method -- called Policy Optimizer for
Exponential Models (POEM) -- for learning stochastic linear rules for
structured output prediction. We present a decomposition of the POEM objective
that enables efficient stochastic gradient optimization. POEM is evaluated on
several multi-label classification problems showing substantially improved
robustness and generalization performance compared to the state-of-the-art.Comment: 10 page
Extreme Multilabel Classification for Specialist Doctor Recommendation with Implicit Feedback and Limited Patient Metadata
Recommendation Systems (RS) are often used to address the issue of medical
doctor referrals. However, these systems require access to patient feedback and
medical records, which may not always be available in real-world scenarios. Our
research focuses on medical referrals and aims to predict recommendations in
different specialties of physicians for both new patients and those with a
consultation history. We use Extreme Multilabel Classification (XML), commonly
employed in text-based classification tasks, to encode available features and
explore different scenarios. While its potential for recommendation tasks has
often been suggested, this has not been thoroughly explored in the literature.
Motivated by the doctor referral case, we show how to recast a traditional
recommender setting into a multilabel classification problem that current XML
methods can solve. Further, we propose a unified model leveraging patient
history across different specialties. Compared to state-of-the-art RS using the
same features, our approach consistently improves standard recommendation
metrics up to approximately for patients with a previous consultation
history. For new patients, XML proves better at exploiting available features,
outperforming the benchmark in favorable scenarios, with particular emphasis on
recall metrics. Thus, our approach brings us one step closer to creating more
effective and personalized doctor referral systems. Additionally, it highlights
XML as a promising alternative to current hybrid or content-based RS, while
identifying key aspects to take into account when using XML for recommendation
tasks
- …