4 research outputs found

    Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing

    Full text link
    We develop two methods for the following fundamental statistical task: given an ϵ\epsilon-corrupted set of nn samples from a dd-dimensional sub-Gaussian distribution, return an approximate top eigenvector of the covariance matrix. Our first robust PCA algorithm runs in polynomial time, returns a 1O(ϵlogϵ1)1 - O(\epsilon\log\epsilon^{-1})-approximate top eigenvector, and is based on a simple iterative filtering approach. Our second, which attains a slightly worse approximation factor, runs in nearly-linear time and sample complexity under a mild spectral gap assumption. These are the first polynomial-time algorithms yielding non-trivial information about the covariance of a corrupted sub-Gaussian distribution without requiring additional algebraic structure of moments. As a key technical tool, we develop the first width-independent solvers for Schatten-pp norm packing semidefinite programs, giving a (1+ϵ)(1 + \epsilon)-approximate solution in O(plog(ndϵ)ϵ1)O(p\log(\tfrac{nd}{\epsilon})\epsilon^{-1}) input-sparsity time iterations (where nn, dd are problem dimensions).Comment: 35 page

    Robust Regression Revisited: Acceleration and Improved Estimation Rates

    Full text link
    We study fast algorithms for statistical regression problems under the strong contamination model, where the goal is to approximately optimize a generalized linear model (GLM) given adversarially corrupted samples. Prior works in this line of research were based on the robust gradient descent framework of Prasad et. al., a first-order method using biased gradient queries, or the Sever framework of Diakonikolas et. al., an iterative outlier-removal method calling a stationary point finder. We present nearly-linear time algorithms for robust regression problems with improved runtime or estimation guarantees compared to the state-of-the-art. For the general case of smooth GLMs (e.g. logistic regression), we show that the robust gradient descent framework of Prasad et. al. can be accelerated, and show our algorithm extends to optimizing the Moreau envelopes of Lipschitz GLMs (e.g. support vector machines), answering several open questions in the literature. For the well-studied case of robust linear regression, we present an alternative approach obtaining improved estimation rates over prior nearly-linear time algorithms. Interestingly, our method starts with an identifiability proof introduced in the context of the sum-of-squares algorithm of Bakshi and Prasad, which achieved optimal error rates while requiring large polynomial runtime and sample complexity. We reinterpret their proof within the Sever framework and obtain a dramatically faster and more sample-efficient algorithm under fewer distributional assumptions.Comment: 47 page

    Solving SDP Faster: A Robust IPM Framework and Efficient Implementation

    Full text link
    This paper introduces a new robust interior point method analysis for semidefinite programming (SDP). This new robust analysis can be combined with either logarithmic barrier or hybrid barrier. Under this new framework, we can improve the running time of semidefinite programming (SDP) with variable size n×nn \times n and mm constraints up to ϵ\epsilon accuracy. We show that for the case m=Ω(n2)m = \Omega(n^2), we can solve SDPs in mωm^{\omega} time. This suggests solving SDP is nearly as fast as solving the linear system with equal number of variables and constraints. This is the first result that tall dense SDP can be solved in the nearly-optimal running time, and it also improves the state-of-the-art SDP solver [Jiang, Kathuria, Lee, Padmanabhan and Song, FOCS 2020]. In addition to our new IPM analysis, we also propose a number of techniques that might be of further interest, such as, maintaining the inverse of a Kronecker product using lazy updates, a general amortization scheme for positive semidefinite matrices

    Robust and Differentially Private Mean Estimation

    Full text link
    In statistical learning and analysis from shared data, which is increasingly widely adopted in platforms such as federated learning and meta-learning, there are two major concerns: privacy and robustness. Each participating individual should be able to contribute without the fear of leaking one's sensitive information. At the same time, the system should be robust in the presence of malicious participants inserting corrupted data. Recent algorithmic advances in learning from shared data focus on either one of these threats, leaving the system vulnerable to the other. We bridge this gap for the canonical problem of estimating the mean from i.i.d. samples. We introduce PRIME, which is the first efficient algorithm that achieves both privacy and robustness for a wide range of distributions. We further complement this result with a novel exponential time algorithm that improves the sample complexity of PRIME, achieving a near-optimal guarantee and matching a known lower bound for (non-robust) private mean estimation. This proves that there is no extra statistical cost to simultaneously guaranteeing privacy and robustness.Comment: 58 pages, 2 figures, both exponential time and efficient algorithms no longer require a known bound on the true mea
    corecore