12 research outputs found

    Transfer Learning for Contextual Multi-armed Bandits

    Full text link
    Motivated by a range of applications, we study in this paper the problem of transfer learning for nonparametric contextual multi-armed bandits under the covariate shift model, where we have data collected on source bandits before the start of the target bandit learning. The minimax rate of convergence for the cumulative regret is established and a novel transfer learning algorithm that attains the minimax regret is proposed. The results quantify the contribution of the data from the source domains for learning in the target domain in the context of nonparametric contextual multi-armed bandits. In view of the general impossibility of adaptation to unknown smoothness, we develop a data-driven algorithm that achieves near-optimal statistical guarantees (up to a logarithmic factor) while automatically adapting to the unknown parameters over a large collection of parameter spaces under an additional self-similarity assumption. A simulation study is carried out to illustrate the benefits of utilizing the data from the auxiliary source domains for learning in the target domain

    Uncertainty quantification for nonconvex tensor completion: Confidence intervals, heteroscedasticity and optimality

    Full text link
    We study the distribution and uncertainty of nonconvex optimization for noisy tensor completion -- the problem of estimating a low-rank tensor given incomplete and corrupted observations of its entries. Focusing on a two-stage estimation algorithm proposed by Cai et al. (2019), we characterize the distribution of this nonconvex estimator down to fine scales. This distributional theory in turn allows one to construct valid and short confidence intervals for both the unseen tensor entries and the unknown tensor factors. The proposed inferential procedure enjoys several important features: (1) it is fully adaptive to noise heteroscedasticity, and (2) it is data-driven and automatically adapts to unknown noise distributions. Furthermore, our findings unveil the statistical optimality of nonconvex tensor completion: it attains un-improvable â„“2\ell_{2} accuracy -- including both the rates and the pre-constants -- when estimating both the unknown tensor and the underlying tensor factors.Comment: Accepted in part to ICML 202

    Minimax Estimation of Linear Functions of Eigenvectors in the Face of Small Eigen-Gaps

    Full text link
    Eigenvector perturbation analysis plays a vital role in various data science applications. A large body of prior works, however, focused on establishing â„“2\ell_{2} eigenvector perturbation bounds, which are often highly inadequate in addressing tasks that rely on fine-grained behavior of an eigenvector. This paper makes progress on this by studying the perturbation of linear functions of an unknown eigenvector. Focusing on two fundamental problems -- matrix denoising and principal component analysis -- in the presence of Gaussian noise, we develop a suite of statistical theory that characterizes the perturbation of arbitrary linear functions of an unknown eigenvector. In order to mitigate a non-negligible bias issue inherent to the natural ``plug-in'' estimator, we develop de-biased estimators that (1) achieve minimax lower bounds for a family of scenarios (modulo some logarithmic factor), and (2) can be computed in a data-driven manner without sample splitting. Noteworthily, the proposed estimators are nearly minimax optimal even when the associated eigen-gap is {\em substantially smaller} than what is required in prior statistical theory

    Nonconvex Low-Rank Tensor Completion from Noisy Data

    Full text link
    We study a noisy tensor completion problem of broad practical interest, namely, the reconstruction of a low-rank tensor from highly incomplete and randomly corrupted observations of its entries. While a variety of prior work has been dedicated to this problem, prior algorithms either are computationally too expensive for large-scale applications, or come with sub-optimal statistical guarantees. Focusing on "incoherent" and well-conditioned tensors of a constant CP rank, we propose a two-stage nonconvex algorithm -- (vanilla) gradient descent following a rough initialization -- that achieves the best of both worlds. Specifically, the proposed nonconvex algorithm faithfully completes the tensor and retrieves all individual tensor factors within nearly linear time, while at the same time enjoying near-optimal statistical guarantees (i.e. minimal sample complexity and optimal estimation accuracy). The estimation errors are evenly spread out across all entries, thus achieving optimal ℓ∞\ell_{\infty} statistical accuracy. We have also discussed how to extend our approach to accommodate asymmetric tensors. The insight conveyed through our analysis of nonconvex optimization might have implications for other tensor estimation problems.Comment: Accepted to Operations Researc

    Efficient Estimation and Inference in Nonconvex Low-Complexity Models

    No full text
    Low-complexity models serve as a pivotal tool for extraction of key information from large-scale data, spanning a varied array of machine learning applications. However, due to the limits of computation and the nonconvexity issue in high dimensions, modern data analysis calls for new procedures that allow significant reduction of sample size and computational costs, while at the same time preserving near-optimal statistical accuracy. This thesis is devoted to development of efficient estimation and inference methods for low-rank models, and the exploration of theoretical foundations underlying these approaches. We start with statistical estimation of the column space of an unknown matrix given noisy and partial observations, and focus on the highly unbalanced case where the column dimension far exceeds the row dimension. We investigate an efficient spectral method and establish near-optimal statistical guarantees in terms of both ℓ2\ell_2 and ℓ2,∞\ell_{2,\infty} estimation accuracy. When applied to concrete statistical applications---tensor completion, principal component analysis and community recovery---the general framework leads to significant performance improvement over prior literature. Moving beyond matrix-type data, we study a natural higher-order generalization---noisy tensor completion. Given that existing methods either are computationally expensive or fail to achieve statistical optimal performance, we propose a two-stage nonconvex algorithm achieving near-optimal computational efficiency (i.e. linear time complexity) and statistical accuracy (i.e. minimal sample complexity and optimal estimation accuracy) at once. In addition to estimation, we further characterize the non-asymptotic distribution of the proposed nonconvex estimator down to fine scales, and develop a data-driven inferential procedure to construct optimal entrywise confidence intervals for the unknowns, which fully adapts to unknown noise distributions and noise heteroscedasticity. As a byproduct, the distributional theory justifies the statistical optimality of the nonconvex estimator---its ℓ2\ell_2 estimation error is un-improvable including the pre-constant. All of this is attained through the integrated consideration of statistics and nonconvex optimization, and fine-grained analysis of spectral methods

    Conditional Rényi Divergence Saddlepoint and the Maximization of α-Mutual Information

    No full text
    Rényi-type generalizations of entropy, relative entropy and mutual information have found numerous applications throughout information theory and beyond. While there is consensus that the ways A. Rényi generalized entropy and relative entropy in 1961 are the “right” ones, several candidates have been put forth as possible mutual informations of order α . In this paper we lend further evidence to the notion that a Bayesian measure of statistical distinctness introduced by R. Sibson in 1969 (closely related to Gallager’s E 0 function) is the most natural generalization, lending itself to explicit computation and maximization, as well as closed-form formulas. This paper considers general (not necessarily discrete) alphabets and extends the major analytical results on the saddle-point and saddle-level of the conditional relative entropy to the conditional Rényi divergence. Several examples illustrate the main application of these results, namely, the maximization of α -mutual information with and without constraints

    Is Q-Learning Minimax Optimal? A Tight Sample Complexity Analysis

    Full text link
    Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP) in a model-free fashion, lies at the heart of reinforcement learning. When it comes to the synchronous setting (such that independent samples for all state-action pairs are drawn from a generative model in each iteration), substantial progress has been made towards understanding the sample efficiency of Q-learning. Consider a γ\gamma-discounted infinite-horizon MDP with state space S\mathcal{S} and action space A\mathcal{A}: to yield an entrywise ε\varepsilon-approximation of the optimal Q-function, state-of-the-art theory for Q-learning requires a sample size exceeding the order of ∣S∣∣A∣(1−γ)5ε2\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^5\varepsilon^{2}}, which fails to match existing minimax lower bounds. This gives rise to natural questions: what is the sharp sample complexity of Q-learning? Is Q-learning provably sub-optimal? This paper addresses these questions for the synchronous setting: (1) when ∣A∣=1|\mathcal{A}|=1 (so that Q-learning reduces to TD learning), we prove that the sample complexity of TD learning is minimax optimal and scales as ∣S∣(1−γ)3ε2\frac{|\mathcal{S}|}{(1-\gamma)^3\varepsilon^2} (up to log factor); (2) when ∣A∣≥2|\mathcal{A}|\geq 2, we settle the sample complexity of Q-learning to be on the order of ∣S∣∣A∣(1−γ)4ε2\frac{|\mathcal{S}||\mathcal{A}|}{(1-\gamma)^4\varepsilon^2} (up to log factor). Our theory unveils the strict sub-optimality of Q-learning when ∣A∣≥2|\mathcal{A}|\geq 2, and rigorizes the negative impact of over-estimation in Q-learning. Finally, we extend our analysis to accommodate asynchronous Q-learning (i.e., the case with Markovian samples), sharpening the horizon dependency of its sample complexity to be 1(1−γ)4\frac{1}{(1-\gamma)^4}.Comment: v3 adds two main theorems: (1) we prove the minimax optimality of TD learning in the synchronous case; (2) for asynchronous Q-learning, we sharpen the horizon dependency of sample complexity to $\frac{1}{(1-\gamma)^4}

    Regional Difference in the Association between the Trajectory of Selenium Intake and Hypertension: A 20-Year Cohort Study.

    Get PDF
    The effect of selenium on hypertension is inconclusive. We aimed to study the relationship between selenium intake and incident hypertension. Adults (age ≥20 years) in the China Health and Nutrition Survey were followed up from 1991 to 2011 ( = 13,668). The latent class modeling method was used to identify trajectory groups of selenium intake. A total of 4039 respondents developed hypertension. The incidence of hypertension was 30.1, 30.5, 30.6, and 31.2 per 1000 person-years among participants with cumulative average selenium intake of 21.0 ± 5.1, 33.2 ± 2.8, 43.8 ± 3.6, and 68.3 ± 25.2 µg/day, respectively. Region and selenium intake interaction in relation to hypertension was significant. In the multivariable model, cumulative intake of selenium was only inversely associated with the incident hypertension in northern participants (low selenium zone), and not in southern participants. Compared to selenium intake trajectory Group 1 (stable low intake), all three trajectory groups had a low hazard ratio for hypertension among the northern participants. However, Group 4 (high intake and decreased) showed an increasing trend of hypertension risk in the south. In conclusion, the association between selenium intake and the incidence of hypertension varied according to regions in China. In the low soil selenium zone, high selenium intake might be beneficial for hypertension prevention
    corecore