We consider a contextual online learning (multi-armed bandit) problem with
high-dimensional covariate x and decision y. The reward
function to learn, f(x,y), does not have a particular
parametric form. The literature has shown that the optimal regret is
O~(T(dxβ+dyβ+1)/(dxβ+dyβ+2)), where dxβ and dyβ are the
dimensions of x and y, and thus it suffers from the curse
of dimensionality. In many applications, only a small subset of variables in
the covariate affect the value of f, which is referred to as
\textit{sparsity} in statistics. To take advantage of the sparsity structure of
the covariate, we propose a variable selection algorithm called
\textit{BV-LASSO}, which incorporates novel ideas such as binning and voting to
apply LASSO to nonparametric settings. Our algorithm achieves the regret
O~(T(dxββ+dyβ+1)/(dxββ+dyβ+2)), where dxββ is the effective
covariate dimension. The regret matches the optimal regret when the covariate
is dxββ-dimensional and thus cannot be improved. Our algorithm may serve as
a general recipe to achieve dimension reduction via variable selection in
nonparametric settings