257 research outputs found
A Contextual Bandit Bake-off
Contextual bandit algorithms are essential for solving many real-world
interactive machine learning problems. Despite multiple recent successes on
statistically and computationally efficient methods, the practical behavior of
these algorithms is still poorly understood. We leverage the availability of
large numbers of supervised learning datasets to empirically evaluate
contextual bandit algorithms, focusing on practical methods that learn by
relying on optimization oracles from supervised learning. We find that a recent
method (Foster et al., 2018) using optimism under uncertainty works the best
overall. A surprisingly close second is a simple greedy baseline that only
explores implicitly through the diversity of contexts, followed by a variant of
Online Cover (Agarwal et al., 2014) which tends to be more conservative but
robust to problem specification by design. Along the way, we also evaluate
various components of contextual bandit algorithm design such as loss
estimators. Overall, this is a thorough study and review of contextual bandit
methodology
RELEAF: An Algorithm for Learning and Exploiting Relevance
Recommender systems, medical diagnosis, network security, etc., require
on-going learning and decision-making in real time. These -- and many others --
represent perfect examples of the opportunities and difficulties presented by
Big Data: the available information often arrives from a variety of sources and
has diverse features so that learning from all the sources may be valuable but
integrating what is learned is subject to the curse of dimensionality. This
paper develops and analyzes algorithms that allow efficient learning and
decision-making while avoiding the curse of dimensionality. We formalize the
information available to the learner/decision-maker at a particular time as a
context vector which the learner should consider when taking actions. In
general the context vector is very high dimensional, but in many settings, the
most relevant information is embedded into only a few relevant dimensions. If
these relevant dimensions were known in advance, the problem would be simple --
but they are not. Moreover, the relevant dimensions may be different for
different actions. Our algorithm learns the relevant dimensions for each
action, and makes decisions based in what it has learned. Formally, we build on
the structure of a contextual multi-armed bandit by adding and exploiting a
relevance relation. We prove a general regret bound for our algorithm whose
time order depends only on the maximum number of relevant dimensions among all
the actions, which in the special case where the relevance relation is
single-valued (a function), reduces to ; in the
absence of a relevance relation, the best known contextual bandit algorithms
achieve regret , where is the full dimension of
the context vector.Comment: to appear in IEEE Journal of Selected Topics in Signal Processing,
201
Offline and Online Models for Learning Pairwise Relations in Data
Pairwise relations between data points are essential for numerous machine learning algorithms. Many representation learning methods consider pairwise relations to identify the latent features and patterns in the data. This thesis, investigates learning of pairwise relations from two different perspectives: offline learning and online learning.The first part of the thesis focuses on offline learning by starting with an investigation of the performance modeling of a synchronization method in concurrent programming using a Markov chain whose state transition matrix models pairwise relations between involved cores in a computer process.Then the thesis focuses on a particular pairwise distance measure, the minimax distance, and explores memory-efficient approaches to computing this distance by proposing a hierarchical representation of the data with a linear memory requirement with respect to the number of data points, from which the exact pairwise minimax distances can be derived in a memory-efficient manner. Then, a memory-efficient sampling method is proposed that follows the aforementioned hierarchical representation of the data and samples the data points in a way that the minimax distances between all data points are maximally preserved. Finally, the thesis proposes a practical non-parametric clustering of vehicle motion trajectories to annotate traffic scenarios based on transitive relations between trajectories in an embedded space.The second part of the thesis takes an online learning perspective, and starts by presenting an online learning method for identifying bottlenecks in a road network by extracting the minimax path, where bottlenecks are considered as road segments with the highest cost, e.g., in the sense of travel time. Inspired by real-world road networks, the thesis assumes a stochastic traffic environment in which the road-specific probability distribution of travel time is unknown. Therefore, it needs to learn the parameters of the probability distribution through observations by modeling the bottleneck identification task as a combinatorial semi-bandit problem. The proposed approach takes into account the prior knowledge and follows a Bayesian approach to update the parameters. Moreover, it develops a combinatorial variant of Thompson Sampling and derives an upper bound for the corresponding Bayesian regret. Furthermore, the thesis proposes an approximate algorithm to address the respective computational intractability issue.Finally, the thesis considers contextual information of road network segments by extending the proposed model to a contextual combinatorial semi-bandit framework and investigates and develops various algorithms for this contextual combinatorial setting
- …