12 research outputs found
Transfer Learning for Contextual Multi-armed Bandits
Motivated by a range of applications, we study in this paper the problem of
transfer learning for nonparametric contextual multi-armed bandits under the
covariate shift model, where we have data collected on source bandits before
the start of the target bandit learning. The minimax rate of convergence for
the cumulative regret is established and a novel transfer learning algorithm
that attains the minimax regret is proposed. The results quantify the
contribution of the data from the source domains for learning in the target
domain in the context of nonparametric contextual multi-armed bandits.
In view of the general impossibility of adaptation to unknown smoothness, we
develop a data-driven algorithm that achieves near-optimal statistical
guarantees (up to a logarithmic factor) while automatically adapting to the
unknown parameters over a large collection of parameter spaces under an
additional self-similarity assumption. A simulation study is carried out to
illustrate the benefits of utilizing the data from the auxiliary source domains
for learning in the target domain
Uncertainty quantification for nonconvex tensor completion: Confidence intervals, heteroscedasticity and optimality
We study the distribution and uncertainty of nonconvex optimization for noisy
tensor completion -- the problem of estimating a low-rank tensor given
incomplete and corrupted observations of its entries. Focusing on a two-stage
estimation algorithm proposed by Cai et al. (2019), we characterize the
distribution of this nonconvex estimator down to fine scales. This
distributional theory in turn allows one to construct valid and short
confidence intervals for both the unseen tensor entries and the unknown tensor
factors. The proposed inferential procedure enjoys several important features:
(1) it is fully adaptive to noise heteroscedasticity, and (2) it is data-driven
and automatically adapts to unknown noise distributions. Furthermore, our
findings unveil the statistical optimality of nonconvex tensor completion: it
attains un-improvable accuracy -- including both the rates and the
pre-constants -- when estimating both the unknown tensor and the underlying
tensor factors.Comment: Accepted in part to ICML 202
Minimax Estimation of Linear Functions of Eigenvectors in the Face of Small Eigen-Gaps
Eigenvector perturbation analysis plays a vital role in various data science
applications. A large body of prior works, however, focused on establishing
eigenvector perturbation bounds, which are often highly inadequate
in addressing tasks that rely on fine-grained behavior of an eigenvector. This
paper makes progress on this by studying the perturbation of linear functions
of an unknown eigenvector. Focusing on two fundamental problems -- matrix
denoising and principal component analysis -- in the presence of Gaussian
noise, we develop a suite of statistical theory that characterizes the
perturbation of arbitrary linear functions of an unknown eigenvector. In order
to mitigate a non-negligible bias issue inherent to the natural ``plug-in''
estimator, we develop de-biased estimators that (1) achieve minimax lower
bounds for a family of scenarios (modulo some logarithmic factor), and (2) can
be computed in a data-driven manner without sample splitting. Noteworthily, the
proposed estimators are nearly minimax optimal even when the associated
eigen-gap is {\em substantially smaller} than what is required in prior
statistical theory
Nonconvex Low-Rank Tensor Completion from Noisy Data
We study a noisy tensor completion problem of broad practical interest,
namely, the reconstruction of a low-rank tensor from highly incomplete and
randomly corrupted observations of its entries. While a variety of prior work
has been dedicated to this problem, prior algorithms either are computationally
too expensive for large-scale applications, or come with sub-optimal
statistical guarantees. Focusing on "incoherent" and well-conditioned tensors
of a constant CP rank, we propose a two-stage nonconvex algorithm -- (vanilla)
gradient descent following a rough initialization -- that achieves the best of
both worlds. Specifically, the proposed nonconvex algorithm faithfully
completes the tensor and retrieves all individual tensor factors within nearly
linear time, while at the same time enjoying near-optimal statistical
guarantees (i.e. minimal sample complexity and optimal estimation accuracy).
The estimation errors are evenly spread out across all entries, thus achieving
optimal statistical accuracy. We have also discussed how to
extend our approach to accommodate asymmetric tensors. The insight conveyed
through our analysis of nonconvex optimization might have implications for
other tensor estimation problems.Comment: Accepted to Operations Researc
Efficient Estimation and Inference in Nonconvex Low-Complexity Models
Low-complexity models serve as a pivotal tool for extraction of key information from large-scale data, spanning a varied array of machine learning applications. However, due to the limits of computation and the nonconvexity issue in high dimensions, modern data analysis calls for new procedures that allow significant reduction of sample size and computational costs, while at the same time preserving near-optimal statistical accuracy. This thesis is devoted to development of efficient estimation and inference methods for low-rank models, and the exploration of theoretical foundations underlying these approaches.
We start with statistical estimation of the column space of an unknown matrix given noisy and partial observations, and focus on the highly unbalanced case where the column dimension far exceeds the row dimension. We investigate an efficient spectral method and establish near-optimal statistical guarantees in terms of both and estimation accuracy. When applied to concrete statistical applications---tensor completion, principal component analysis and community recovery---the general framework leads to significant performance improvement over prior literature.
Moving beyond matrix-type data, we study a natural higher-order generalization---noisy tensor completion. Given that existing methods either are computationally expensive or fail to achieve statistical optimal performance, we propose a two-stage nonconvex algorithm achieving near-optimal computational efficiency (i.e. linear time complexity) and statistical accuracy (i.e. minimal sample complexity and optimal estimation accuracy) at once.
In addition to estimation, we further characterize the non-asymptotic distribution of the proposed nonconvex estimator down to fine scales, and develop a data-driven inferential procedure to construct optimal entrywise confidence intervals for the unknowns, which fully adapts to unknown noise distributions and noise heteroscedasticity. As a byproduct, the distributional theory justifies the statistical optimality of the nonconvex estimator---its estimation error is un-improvable including the pre-constant. All of this is attained through the integrated consideration of statistics and nonconvex optimization, and fine-grained analysis of spectral methods
Conditional Rényi Divergence Saddlepoint and the Maximization of α-Mutual Information
Rényi-type generalizations of entropy, relative entropy and mutual information have found numerous applications throughout information theory and beyond. While there is consensus that the ways A. Rényi generalized entropy and relative entropy in 1961 are the “right” ones, several candidates have been put forth as possible mutual informations of order α . In this paper we lend further evidence to the notion that a Bayesian measure of statistical distinctness introduced by R. Sibson in 1969 (closely related to Gallager’s E 0 function) is the most natural generalization, lending itself to explicit computation and maximization, as well as closed-form formulas. This paper considers general (not necessarily discrete) alphabets and extends the major analytical results on the saddle-point and saddle-level of the conditional relative entropy to the conditional Rényi divergence. Several examples illustrate the main application of these results, namely, the maximization of α -mutual information with and without constraints
Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis
Q-learning, which seeks to learn the optimal Q-function of a Markov decision
process (MDP) in a model-free fashion, lies at the heart of reinforcement
learning. When it comes to the synchronous setting (such that independent
samples for all state-action pairs are drawn from a generative model in each
iteration), substantial progress has been made towards understanding the sample
efficiency of Q-learning. Consider a -discounted infinite-horizon MDP
with state space and action space : to yield an
entrywise -approximation of the optimal Q-function,
state-of-the-art theory for Q-learning requires a sample size exceeding the
order of ,
which fails to match existing minimax lower bounds. This gives rise to natural
questions: what is the sharp sample complexity of Q-learning? Is Q-learning
provably sub-optimal? This paper addresses these questions for the synchronous
setting: (1) when (so that Q-learning reduces to TD
learning), we prove that the sample complexity of TD learning is minimax
optimal and scales as (up to
log factor); (2) when , we settle the sample complexity of
Q-learning to be on the order of
(up to log
factor). Our theory unveils the strict sub-optimality of Q-learning when
, and rigorizes the negative impact of over-estimation in
Q-learning. Finally, we extend our analysis to accommodate asynchronous
Q-learning (i.e., the case with Markovian samples), sharpening the horizon
dependency of its sample complexity to be .Comment: v3 adds two main theorems: (1) we prove the minimax optimality of TD
learning in the synchronous case; (2) for asynchronous Q-learning, we sharpen
the horizon dependency of sample complexity to $\frac{1}{(1-\gamma)^4}
Recommended from our members
Uncertainty Quantification for Nonconvex Tensor Completion: Confidence Intervals, Heteroscedasticity and Optimality
We study the distribution and uncertainty of nonconvex optimization for noisy tensor completion—the problem of estimating a low-rank tensor given incomplete and corrupted observations of its entries. Focusing on a two-stage estimation algorithm proposed by Cai et al. , we characterize the distribution of this nonconvex estimator down to fine scales. This distributional theory in turn allows one to construct valid and short confidence intervals for both the unseen tensor entries and the unknown tensor factors. The proposed inferential procedure enjoys several important features: (1) it is fully adaptive to noise heteroscedasticity, and (2) it is data-driven and automatically adapts to unknown noise distributions. Furthermore, our findings unveil the statistical optimality of nonconvex tensor completion: it attains un-improvable ℓ2 accuracy—including both the rates and the pre-constants—when estimating both the unknown tensor and the underlying tensor factors
Recommended from our members
Subspace estimation from unbalanced and incomplete data matrices: ℓ2,∞ statistical guarantees
This paper is concerned with estimating the column space of an unknown low-rank matrix A⋆ ∈ Rd1×d2, given noisy and partial observations of its entries. There is no shortage of scenarios where the observations—while being too noisy to support faithful recovery of the entire matrix—still convey sufficient information to enable reliable estimation of the column space of interest. This is particularly evident and crucial for the highly unbalanced case where the column dimension d2 far exceeds the row dimension d1, which is the focal point of the current paper.
We investigate an efficient spectral method, which operates upon the sample Gram matrix with diagonal deletion. While this algorithmic idea has been studied before, we establish new statistical guarantees for this method in terms of both ℓ2 and ℓ2,∞ estimation accuracy, which improve upon prior results if d2 is substantially larger than d1. To illustrate the effectiveness of our findings, we derive matching minimax lower bounds with respect to the noise levels, and develop consequences of our general theory for three applications of practical importance: (1) tensor completion from noisy data, (2) covariance
estimation/principal component analysis with missing data and (3) community recovery in bipartite graphs. Our theory leads to improved performance guarantees for all three cases
Regional Difference in the Association between the Trajectory of Selenium Intake and Hypertension: A 20-Year Cohort Study.
The effect of selenium on hypertension is inconclusive. We aimed to study the relationship between selenium intake and incident hypertension. Adults (age ≥20 years) in the China Health and Nutrition Survey were followed up from 1991 to 2011 ( = 13,668). The latent class modeling method was used to identify trajectory groups of selenium intake. A total of 4039 respondents developed hypertension. The incidence of hypertension was 30.1, 30.5, 30.6, and 31.2 per 1000 person-years among participants with cumulative average selenium intake of 21.0 ± 5.1, 33.2 ± 2.8, 43.8 ± 3.6, and 68.3 ± 25.2 µg/day, respectively. Region and selenium intake interaction in relation to hypertension was significant. In the multivariable model, cumulative intake of selenium was only inversely associated with the incident hypertension in northern participants (low selenium zone), and not in southern participants. Compared to selenium intake trajectory Group 1 (stable low intake), all three trajectory groups had a low hazard ratio for hypertension among the northern participants. However, Group 4 (high intake and decreased) showed an increasing trend of hypertension risk in the south. In conclusion, the association between selenium intake and the incidence of hypertension varied according to regions in China. In the low soil selenium zone, high selenium intake might be beneficial for hypertension prevention