58 research outputs found
New Models qnd Algorithms for Bandits and Markets
Inspired by advertising markets, we consider large-scale sequential decision making problems in which a learner must deploy an algorithm to behave optimally under uncertainty. Although many of these problems can be modeled as contextual bandit problems, we argue that the tools and techniques for analyzing bandit problems with large numbers of actions and contexts can be greatly expanded. While convexity and metric-similarity assumptions on the process generating rewards have yielded some algorithms in existing literature, certain types of assumptions that have been fruitful in offline supervised learning settings have yet to even be considered. Notably missing, for example, is any kind of graphical model approach to assuming structured rewards, despite the success such assumptions have
achieved in inducing scalable learning and inference with high-dimensional distributions. Similarly, we observe that there are countless tools for understanding the relationship between a choice of model class in supervised learning, and the generalization error of the best fit from that class, such as the celebrated VC-theory. However, an analogous notion of dimensionality, which relates a generic structural assumption on rewards to regret rates in an online optimization problem, is not fully developed. The primary goal of this dissertation, therefore, will be to fill out the space of models, algorithms, and assumptions used in sequential decision making problems. Toward this end, we will develop a theory for bandit problems with structured rewards that permit a graphical model representation. We will give an efficient algorithm for regret-minimization in such a setting, and along the way will develop a deeper connection between online supervised learning and regret-minimization. This dissertation will also introduce a complexity measure for generic structural assumptions on reward functions, which we call the Haystack Dimension. We will prove that the Haystack Dimension characterizes the optimal rates achievable up to log factors. Finally, we will describe more application-oriented techniques for solving problems in advertising markets, which again demonstrate how methods from traditional disciplines, such as statistical survival analysis, can be leveraged to design novel algorithms for optimization in markets
Learning Thresholds with Latent Values and Censored Feedback
In this paper, we investigate a problem of actively learning threshold in
latent space, where the unknown reward depends on the proposed
threshold and latent value and it can be achieved if the
threshold is lower than or equal to the unknown latent value. This problem has
broad applications in practical scenarios, e.g., reserve price optimization in
online auctions, online task assignments in crowdsourcing, setting recruiting
bars in hiring, etc. We first characterize the query complexity of learning a
threshold with the expected reward at most smaller than the optimum
and prove that the number of queries needed can be infinitely large even when
is monotone with respect to both and . On the
positive side, we provide a tight query complexity
when is monotone and the CDF of value
distribution is Lipschitz. Moreover, we show a tight
query complexity can be achieved as long as
satisfies one-sided Lipschitzness, which provides a complete characterization
for this problem. Finally, we extend this model to an online learning setting
and demonstrate a tight regret bound using continuous-arm
bandit techniques and the aforementioned query complexity results.Comment: 18 page
Adversarially Robust Optimization with Gaussian Processes
In this paper, we consider the problem of Gaussian process (GP) optimization
with an added robustness requirement: The returned point may be perturbed by an
adversary, and we require the function value to remain as high as possible even
after this perturbation. This problem is motivated by settings in which the
underlying functions during optimization and implementation stages are
different, or when one is interested in finding an entire region of good inputs
rather than only a single point. We show that standard GP optimization
algorithms do not exhibit the desired robustness properties, and provide a
novel confidence-bound based algorithm StableOpt for this purpose. We
rigorously establish the required number of samples for StableOpt to find a
near-optimal point, and we complement this guarantee with an
algorithm-independent lower bound. We experimentally demonstrate several
potential applications of interest using real-world data sets, and we show that
StableOpt consistently succeeds in finding a stable maximizer where several
baseline methods fail.Comment: Corrected typo
Bounded Regret for Finite-Armed Structured Bandits
We study a new type of K-armed bandit problem where the expected return of
one arm may depend on the returns of other arms. We present a new algorithm for
this general class of problems and show that under certain circumstances it is
possible to achieve finite expected cumulative regret. We also give
problem-dependent lower bounds on the cumulative regret showing that at least
in special cases the new algorithm is nearly optimal.Comment: 16 page
Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Multi-armed bandit problems are the most basic examples of sequential
decision problems with an exploration-exploitation trade-off. This is the
balance between staying with the option that gave highest payoffs in the past
and exploring new options that might give higher payoffs in the future.
Although the study of bandit problems dates back to the Thirties,
exploration-exploitation trade-offs arise in several modern applications, such
as ad placement, website optimization, and packet routing. Mathematically, a
multi-armed bandit is defined by the payoff process associated with each
option. In this survey, we focus on two extreme cases in which the analysis of
regret is particularly simple and elegant: i.i.d. payoffs and adversarial
payoffs. Besides the basic setting of finitely many actions, we also analyze
some of the most important variants and extensions, such as the contextual
bandit model.Comment: To appear in Foundations and Trends in Machine Learnin
Interactive User Intent Modeling
In information retrieval systems, users often have difficulties in forming precise queries to express their information need. One approach to express information need is to explore the information space by providing relevance feedback to recommended items. This feedback is then used to model user search intent. Studies have shown how retrieval performance could be improved by allowing users to give feedback to multiple items such as keywords and documents instead of keywords only. In this thesis, I extend an existing user model which uses document-level and keyword-level feedback to include session-level feedback, and study the usefulness of this extension. By conducting simulation studies in various settings, I investigate the effect of session-level feedback. Based on these simulation results, I conclude that additional session-feedback helps in finding relevant documents by improving F1-score. Results show that more the additional session-feedback, more the improvement in F1-score. However, trade-off of session-feedback instead of document and keyword feedback results in drop in document F1-score, therefore indicating that session-feedback is less informative than document and keyword feedback
Lower Bounds on the Worst-Case Complexity of Efficient Global Optimization
Efficient global optimization is a widely used method for optimizing
expensive black-box functions such as tuning hyperparameter, and designing new
material, etc. Despite its popularity, less attention has been paid to
analyzing the inherent hardness of the problem although, given its extensive
use, it is important to understand the fundamental limits of efficient global
optimization algorithms. In this paper, we study the worst-case complexity of
the efficient global optimization problem and, in contrast to existing
kernel-specific results, we derive a unified lower bound for the complexity of
efficient global optimization in terms of the metric entropy of a ball in its
corresponding reproducing kernel Hilbert space~(RKHS). Specifically, we show
that if there exists a deterministic algorithm that achieves suboptimality gap
smaller than for any function in function evaluations,
it is necessary that is at least
, where
is the covering number, is the ball
centered at with radius in the RKHS and is the
restriction of over the feasible set . Moreover, we show that
this lower bound nearly matches the upper bound attained by non-adaptive search
algorithms for the commonly used squared exponential kernel and the Mat\'ern
kernel with a large smoothness parameter , up to a replacement of by
and a logarithmic term . That is to say, our lower
bound is nearly optimal for these kernels
- …