58 research outputs found

    New Models qnd Algorithms for Bandits and Markets

    Get PDF
    Inspired by advertising markets, we consider large-scale sequential decision making problems in which a learner must deploy an algorithm to behave optimally under uncertainty. Although many of these problems can be modeled as contextual bandit problems, we argue that the tools and techniques for analyzing bandit problems with large numbers of actions and contexts can be greatly expanded. While convexity and metric-similarity assumptions on the process generating rewards have yielded some algorithms in existing literature, certain types of assumptions that have been fruitful in offline supervised learning settings have yet to even be considered. Notably missing, for example, is any kind of graphical model approach to assuming structured rewards, despite the success such assumptions have achieved in inducing scalable learning and inference with high-dimensional distributions. Similarly, we observe that there are countless tools for understanding the relationship between a choice of model class in supervised learning, and the generalization error of the best fit from that class, such as the celebrated VC-theory. However, an analogous notion of dimensionality, which relates a generic structural assumption on rewards to regret rates in an online optimization problem, is not fully developed. The primary goal of this dissertation, therefore, will be to fill out the space of models, algorithms, and assumptions used in sequential decision making problems. Toward this end, we will develop a theory for bandit problems with structured rewards that permit a graphical model representation. We will give an efficient algorithm for regret-minimization in such a setting, and along the way will develop a deeper connection between online supervised learning and regret-minimization. This dissertation will also introduce a complexity measure for generic structural assumptions on reward functions, which we call the Haystack Dimension. We will prove that the Haystack Dimension characterizes the optimal rates achievable up to log factors. Finally, we will describe more application-oriented techniques for solving problems in advertising markets, which again demonstrate how methods from traditional disciplines, such as statistical survival analysis, can be leveraged to design novel algorithms for optimization in markets

    Learning Thresholds with Latent Values and Censored Feedback

    Full text link
    In this paper, we investigate a problem of actively learning threshold in latent space, where the unknown reward g(γ,v)g(\gamma, v) depends on the proposed threshold γ\gamma and latent value vv and it can be onlyonly achieved if the threshold is lower than or equal to the unknown latent value. This problem has broad applications in practical scenarios, e.g., reserve price optimization in online auctions, online task assignments in crowdsourcing, setting recruiting bars in hiring, etc. We first characterize the query complexity of learning a threshold with the expected reward at most ϵ\epsilon smaller than the optimum and prove that the number of queries needed can be infinitely large even when g(γ,v)g(\gamma, v) is monotone with respect to both γ\gamma and vv. On the positive side, we provide a tight query complexity Θ~(1/ϵ3)\tilde{\Theta}(1/\epsilon^3) when gg is monotone and the CDF of value distribution is Lipschitz. Moreover, we show a tight Θ~(1/ϵ3)\tilde{\Theta}(1/\epsilon^3) query complexity can be achieved as long as gg satisfies one-sided Lipschitzness, which provides a complete characterization for this problem. Finally, we extend this model to an online learning setting and demonstrate a tight Θ(T2/3)\Theta(T^{2/3}) regret bound using continuous-arm bandit techniques and the aforementioned query complexity results.Comment: 18 page

    Adversarially Robust Optimization with Gaussian Processes

    Get PDF
    In this paper, we consider the problem of Gaussian process (GP) optimization with an added robustness requirement: The returned point may be perturbed by an adversary, and we require the function value to remain as high as possible even after this perturbation. This problem is motivated by settings in which the underlying functions during optimization and implementation stages are different, or when one is interested in finding an entire region of good inputs rather than only a single point. We show that standard GP optimization algorithms do not exhibit the desired robustness properties, and provide a novel confidence-bound based algorithm StableOpt for this purpose. We rigorously establish the required number of samples for StableOpt to find a near-optimal point, and we complement this guarantee with an algorithm-independent lower bound. We experimentally demonstrate several potential applications of interest using real-world data sets, and we show that StableOpt consistently succeeds in finding a stable maximizer where several baseline methods fail.Comment: Corrected typo

    Bounded Regret for Finite-Armed Structured Bandits

    Full text link
    We study a new type of K-armed bandit problem where the expected return of one arm may depend on the returns of other arms. We present a new algorithm for this general class of problems and show that under certain circumstances it is possible to achieve finite expected cumulative regret. We also give problem-dependent lower bounds on the cumulative regret showing that at least in special cases the new algorithm is nearly optimal.Comment: 16 page

    Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems

    Full text link
    Multi-armed bandit problems are the most basic examples of sequential decision problems with an exploration-exploitation trade-off. This is the balance between staying with the option that gave highest payoffs in the past and exploring new options that might give higher payoffs in the future. Although the study of bandit problems dates back to the Thirties, exploration-exploitation trade-offs arise in several modern applications, such as ad placement, website optimization, and packet routing. Mathematically, a multi-armed bandit is defined by the payoff process associated with each option. In this survey, we focus on two extreme cases in which the analysis of regret is particularly simple and elegant: i.i.d. payoffs and adversarial payoffs. Besides the basic setting of finitely many actions, we also analyze some of the most important variants and extensions, such as the contextual bandit model.Comment: To appear in Foundations and Trends in Machine Learnin

    Interactive User Intent Modeling

    Get PDF
    In information retrieval systems, users often have difficulties in forming precise queries to express their information need. One approach to express information need is to explore the information space by providing relevance feedback to recommended items. This feedback is then used to model user search intent. Studies have shown how retrieval performance could be improved by allowing users to give feedback to multiple items such as keywords and documents instead of keywords only. In this thesis, I extend an existing user model which uses document-level and keyword-level feedback to include session-level feedback, and study the usefulness of this extension. By conducting simulation studies in various settings, I investigate the effect of session-level feedback. Based on these simulation results, I conclude that additional session-feedback helps in finding relevant documents by improving F1-score. Results show that more the additional session-feedback, more the improvement in F1-score. However, trade-off of session-feedback instead of document and keyword feedback results in drop in document F1-score, therefore indicating that session-feedback is less informative than document and keyword feedback

    Lower Bounds on the Worst-Case Complexity of Efficient Global Optimization

    Full text link
    Efficient global optimization is a widely used method for optimizing expensive black-box functions such as tuning hyperparameter, and designing new material, etc. Despite its popularity, less attention has been paid to analyzing the inherent hardness of the problem although, given its extensive use, it is important to understand the fundamental limits of efficient global optimization algorithms. In this paper, we study the worst-case complexity of the efficient global optimization problem and, in contrast to existing kernel-specific results, we derive a unified lower bound for the complexity of efficient global optimization in terms of the metric entropy of a ball in its corresponding reproducing kernel Hilbert space~(RKHS). Specifically, we show that if there exists a deterministic algorithm that achieves suboptimality gap smaller than ϵ\epsilon for any function fSf\in S in TT function evaluations, it is necessary that TT is at least Ω(logN(S(X),4ϵ,)log(Rϵ))\Omega\left(\frac{\log\mathcal{N}(S(\mathcal{X}), 4\epsilon,\|\cdot\|_\infty)}{\log(\frac{R}{\epsilon})}\right), where N(,,)\mathcal{N}(\cdot,\cdot,\cdot) is the covering number, SS is the ball centered at 00 with radius RR in the RKHS and S(X)S(\mathcal{X}) is the restriction of SS over the feasible set X\mathcal{X}. Moreover, we show that this lower bound nearly matches the upper bound attained by non-adaptive search algorithms for the commonly used squared exponential kernel and the Mat\'ern kernel with a large smoothness parameter ν\nu, up to a replacement of d/2d/2 by dd and a logarithmic term logRϵ\log\frac{R}{\epsilon}. That is to say, our lower bound is nearly optimal for these kernels
    corecore