1,710 research outputs found

    Minimax risks for sparse regressions: Ultra-high-dimensional phenomenons

    Full text link
    Consider the standard Gaussian linear regression model Y=Xθ+ϵY=X\theta+\epsilon, where YRnY\in R^n is a response vector and XRnp X\in R^{n*p} is a design matrix. Numerous work have been devoted to building efficient estimators of θ\theta when pp is much larger than nn. In such a situation, a classical approach amounts to assume that θ0\theta_0 is approximately sparse. This paper studies the minimax risks of estimation and testing over classes of kk-sparse vectors θ\theta. These bounds shed light on the limitations due to high-dimensionality. The results encompass the problem of prediction (estimation of XθX\theta), the inverse problem (estimation of θ0\theta_0) and linear testing (testing Xθ=0X\theta=0). Interestingly, an elbow effect occurs when the number of variables klog(p/k)k\log(p/k) becomes large compared to nn. Indeed, the minimax risks and hypothesis separation distances blow up in this ultra-high dimensional setting. We also prove that even dimension reduction techniques cannot provide satisfying results in an ultra-high dimensional setting. Moreover, we compute the minimax risks when the variance of the noise is unknown. The knowledge of this variance is shown to play a significant role in the optimal rates of estimation and testing. All these minimax bounds provide a characterization of statistical problems that are so difficult so that no procedure can provide satisfying results

    Adaptive robust variable selection

    Full text link
    Heavy-tailed high-dimensional data are commonly encountered in various scientific fields and pose great challenges to modern statistical analysis. A natural procedure to address this problem is to use penalized quantile regression with weighted L1L_1-penalty, called weighted robust Lasso (WR-Lasso), in which weights are introduced to ameliorate the bias problem induced by the L1L_1-penalty. In the ultra-high dimensional setting, where the dimensionality can grow exponentially with the sample size, we investigate the model selection oracle property and establish the asymptotic normality of the WR-Lasso. We show that only mild conditions on the model error distribution are needed. Our theoretical results also reveal that adaptive choice of the weight vector is essential for the WR-Lasso to enjoy these nice asymptotic properties. To make the WR-Lasso practically feasible, we propose a two-step procedure, called adaptive robust Lasso (AR-Lasso), in which the weight vector in the second step is constructed based on the L1L_1-penalized quantile regression estimate from the first step. This two-step procedure is justified theoretically to possess the oracle property and the asymptotic normality. Numerical studies demonstrate the favorable finite-sample performance of the AR-Lasso.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1191 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Bayesian Conditional Tensor Factorizations for High-Dimensional Classification

    Full text link
    In many application areas, data are collected on a categorical response and high-dimensional categorical predictors, with the goals being to build a parsimonious model for classification while doing inferences on the important predictors. In settings such as genomics, there can be complex interactions among the predictors. By using a carefully-structured Tucker factorization, we define a model that can characterize any conditional probability, while facilitating variable selection and modeling of higher-order interactions. Following a Bayesian approach, we propose a Markov chain Monte Carlo algorithm for posterior computation accommodating uncertainty in the predictors to be included. Under near sparsity assumptions, the posterior distribution for the conditional probability is shown to achieve close to the parametric rate of contraction even in ultra high-dimensional settings. The methods are illustrated using simulation examples and biomedical applications

    Private Incremental Regression

    Full text link
    Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally over time. We introduce the problems of private incremental ERM and private incremental regression where the general goal is to always maintain a good empirical risk minimizer for the history observed under differential privacy. Our first contribution is a generic transformation of private batch ERM mechanisms into private incremental ERM mechanisms, based on a simple idea of invoking the private batch ERM procedure at some regular time intervals. We take this construction as a baseline for comparison. We then provide two mechanisms for the private incremental regression problem. Our first mechanism is based on privately constructing a noisy incremental gradient function, which is then used in a modified projected gradient procedure at every timestep. This mechanism has an excess empirical risk of d\approx\sqrt{d}, where dd is the dimensionality of the data. While from the results of [Bassily et al. 2014] this bound is tight in the worst-case, we show that certain geometric properties of the input and constraint set can be used to derive significantly better results for certain interesting regression problems.Comment: To appear in PODS 201

    L1L_1-Penalization in Functional Linear Regression with Subgaussian Design

    Get PDF
    We study functional regression with random subgaussian design and real-valued response. The focus is on the problems in which the regression function can be well approximated by a functional linear model with the slope function being "sparse" in the sense that it can be represented as a sum of a small number of well separated "spikes". This can be viewed as an extension of now classical sparse estimation problems to the case of infinite dictionaries. We study an estimator of the regression function based on penalized empirical risk minimization with quadratic loss and the complexity penalty defined in terms of L1L_1-norm (a continuous version of LASSO). The main goal is to introduce several important parameters characterizing sparsity in this class of problems and to prove sharp oracle inequalities showing how the L2L_2-error of the continuous LASSO estimator depends on the underlying sparsity of the problem

    Entropy-based convergence rates of greedy algorithms

    Full text link
    We present convergence estimates of two types of greedy algorithms in terms of the metric entropy of underlying compact sets. In the first part, we measure the error of a standard greedy reduced basis method for parametric PDEs by the metric entropy of the solution manifold in Banach spaces. This contrasts with the classical analysis based on the Kolmogorov n-widths and enables us to obtain direct comparisons between the greedy algorithm error and the entropy numbers, where the multiplicative constants are explicit and simple. The entropy-based convergence estimate is sharp and improves upon the classical width-based analysis of reduced basis methods for elliptic model problems. In the second part, we derive a novel and simple convergence analysis of the classical orthogonal greedy algorithm for nonlinear dictionary approximation using the metric entropy of the symmetric convex hull of the dictionary. This also improves upon existing results by giving a direct comparison between the algorithm error and the metric entropy.Comment: 22 pages, no figure
    corecore