Search CORE

139 research outputs found

Further Optimal Regret Bounds for Thompson Sampling

Author: Agrawal Shipra
Goyal Navin
Publication venue
Publication date: 14/09/2012
Field of study

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a randomized algorithm based on Bayesian ideas, and has recently generated significant interest after several studies demonstrated it to have better empirical performance compared to the state of the art methods. In this paper, we provide a novel regret analysis for Thompson Sampling that simultaneously proves both the optimal problem-dependent bound of

(1+\epsilon)\sum_i \frac{\ln T}{\Delta_i}+O(\frac{N}{\epsilon^2})

and the first near-optimal problem-independent bound of

O(\sqrt{NT\ln T})

on the expected regret of this algorithm. Our near-optimal problem-independent bound solves a COLT 2012 open problem of Chapelle and Li. The optimal problem-dependent regret bound for this problem was first proven recently by Kaufmann et al. [ALT 2012]. Our novel martingale-based analysis techniques are conceptually simple, easily extend to distributions other than the Beta distribution, and also extend to the more general contextual bandits setting [Manuscript, Agrawal and Goyal, 2012].Comment: arXiv admin note: substantial text overlap with arXiv:1111.179

arXiv.org e-Print Archive

CiteSeerX

Discretizing Continuous Action Space for On-Policy Optimization

Author: Agrawal Shipra
Tang Yunhao
Publication venue
Publication date: 19/03/2020
Field of study

In this work, we show that discretizing action space for continuous control is a simple yet powerful technique for on-policy optimization. The explosion in the number of discrete actions can be efficiently addressed by a policy with factorized distribution across action dimensions. We show that the discrete policy achieves significant performance gains with state-of-the-art on-policy optimization algorithms (PPO, TRPO, ACKTR) especially on high-dimensional tasks with complex dynamics. Additionally, we show that an ordinal parameterization of the discrete distribution can introduce the inductive bias that encodes the natural ordering between discrete actions. This ordinal architecture further significantly improves the performance of PPO/TRPO.Comment: Accepted at AAAI Conference on Artificial Intelligence (2020) in New York, NY, USA. An open source implementation can be found at https://github.com/robintyh1/onpolicybaseline

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Framework for High-Accuracy Privacy-Preserving Mining

Author: Agrawal Shipra
Haritsa Jayant R.
Publication venue
Publication date: 01/01/2004
Field of study

To preserve client privacy in the data mining process, a variety of techniques based on random perturbation of data records have been proposed recently. In this paper, we present a generalized matrix-theoretic model of random perturbation, which facilitates a systematic approach to the design of perturbation mechanisms for privacy-preserving mining. Specifically, we demonstrate that (a) the prior techniques differ only in their settings for the model parameters, and (b) through appropriate choice of parameter settings, we can derive new perturbation techniques that provide highly accurate mining results even under strict privacy guarantees. We also propose a novel perturbation mechanism wherein the model parameters are themselves characterized as random variables, and demonstrate that this feature provides significant improvements in privacy at a very marginal cost in accuracy. While our model is valid for random-perturbation-based privacy-preserving mining in general, we specifically evaluate its utility here with regard to frequent-itemset mining on a variety of real datasets. The experimental results indicate that our mechanisms incur substantially lower identity and support errors as compared to the prior techniques

arXiv.org e-Print Archive

CiteSeerX

Open Access Repository of IISc Research Publications