65 research outputs found
New Models qnd Algorithms for Bandits and Markets
Inspired by advertising markets, we consider large-scale sequential decision making problems in which a learner must deploy an algorithm to behave optimally under uncertainty. Although many of these problems can be modeled as contextual bandit problems, we argue that the tools and techniques for analyzing bandit problems with large numbers of actions and contexts can be greatly expanded. While convexity and metric-similarity assumptions on the process generating rewards have yielded some algorithms in existing literature, certain types of assumptions that have been fruitful in offline supervised learning settings have yet to even be considered. Notably missing, for example, is any kind of graphical model approach to assuming structured rewards, despite the success such assumptions have
achieved in inducing scalable learning and inference with high-dimensional distributions. Similarly, we observe that there are countless tools for understanding the relationship between a choice of model class in supervised learning, and the generalization error of the best fit from that class, such as the celebrated VC-theory. However, an analogous notion of dimensionality, which relates a generic structural assumption on rewards to regret rates in an online optimization problem, is not fully developed. The primary goal of this dissertation, therefore, will be to fill out the space of models, algorithms, and assumptions used in sequential decision making problems. Toward this end, we will develop a theory for bandit problems with structured rewards that permit a graphical model representation. We will give an efficient algorithm for regret-minimization in such a setting, and along the way will develop a deeper connection between online supervised learning and regret-minimization. This dissertation will also introduce a complexity measure for generic structural assumptions on reward functions, which we call the Haystack Dimension. We will prove that the Haystack Dimension characterizes the optimal rates achievable up to log factors. Finally, we will describe more application-oriented techniques for solving problems in advertising markets, which again demonstrate how methods from traditional disciplines, such as statistical survival analysis, can be leveraged to design novel algorithms for optimization in markets
RELEAF: An Algorithm for Learning and Exploiting Relevance
Recommender systems, medical diagnosis, network security, etc., require
on-going learning and decision-making in real time. These -- and many others --
represent perfect examples of the opportunities and difficulties presented by
Big Data: the available information often arrives from a variety of sources and
has diverse features so that learning from all the sources may be valuable but
integrating what is learned is subject to the curse of dimensionality. This
paper develops and analyzes algorithms that allow efficient learning and
decision-making while avoiding the curse of dimensionality. We formalize the
information available to the learner/decision-maker at a particular time as a
context vector which the learner should consider when taking actions. In
general the context vector is very high dimensional, but in many settings, the
most relevant information is embedded into only a few relevant dimensions. If
these relevant dimensions were known in advance, the problem would be simple --
but they are not. Moreover, the relevant dimensions may be different for
different actions. Our algorithm learns the relevant dimensions for each
action, and makes decisions based in what it has learned. Formally, we build on
the structure of a contextual multi-armed bandit by adding and exploiting a
relevance relation. We prove a general regret bound for our algorithm whose
time order depends only on the maximum number of relevant dimensions among all
the actions, which in the special case where the relevance relation is
single-valued (a function), reduces to ; in the
absence of a relevance relation, the best known contextual bandit algorithms
achieve regret , where is the full dimension of
the context vector.Comment: to appear in IEEE Journal of Selected Topics in Signal Processing,
201
- β¦