Search CORE

774 research outputs found

Stochastic Linear Bandits with Hidden Low Rank Structure

Author: Anandkumar Anima
Azizzadenesheli Kamyar
Hassibi Babak
Lale Sahin
Publication venue
Publication date: 27/01/2019
Field of study

High-dimensional representations often have a lower dimensional underlying structure. This is particularly the case in many decision making settings. For example, when the representation of actions is generated from a deep neural network, it is reasonable to expect a low-rank structure whereas conventional structures like sparsity are not valid anymore. Subspace recovery methods, such as Principle Component Analysis (PCA) can find the underlying low-rank structures in the feature space and reduce the complexity of the learning tasks. In this work, we propose Projected Stochastic Linear Bandit (PSLB), an algorithm for high dimensional stochastic linear bandits (SLB) when the representation of actions has an underlying low-dimensional subspace structure. PSLB deploys PCA based projection to iteratively find the low rank structure in SLBs. We show that deploying projection methods assures dimensionality reduction and results in a tighter regret upper bound that is in terms of the dimensionality of the subspace and its properties, rather than the dimensionality of the ambient space. We modify the image classification task into the SLB setting and empirically show that, when a pre-trained DNN provides the high dimensional feature representations, deploying PSLB results in significant reduction of regret and faster convergence to an accurate model compared to state-of-art algorithm

arXiv.org e-Print Archive

Caltech Authors

Stochastic Linear Bandits with Hidden Low Rank Structure

Author: Anandkumar Anima
Azizzadenesheli Kamyar
Hassibi Babak
Lale Sahin
Publication venue
Publication date: 28/01/2019
Field of study

Linear Bandits with Feature Feedback

Author: Bhargava Aniruddha
Nowak Robert
Oswal Urvashi
Publication venue
Publication date: 11/03/2019
Field of study

This paper explores a new form of the linear bandit problem in which the algorithm receives the usual stochastic rewards as well as stochastic feedback about which features are relevant to the rewards, the latter feedback being the novel aspect. The focus of this paper is the development of new theory and algorithms for linear bandits with feature feedback. We show that linear bandits with feature feedback can achieve regret over time horizon

T

that scales like

k\sqrt{T}

, without prior knowledge of which features are relevant nor the number

k

of relevant features. In comparison, the regret of traditional linear bandits is

d\sqrt{T}

, where

d

is the total number of (relevant and irrelevant) features, so the improvement can be dramatic if

k\ll d

. The computational complexity of the new algorithm is proportional to

k

rather than

d

, making it much more suitable for real-world applications compared to traditional linear bandits. We demonstrate the performance of the new algorithm with synthetic and real human-labeled data

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications