4 research outputs found
Robust Sub-Gaussian Principal Component Analysis and Width-Independent Schatten Packing
We develop two methods for the following fundamental statistical task: given
an -corrupted set of samples from a -dimensional sub-Gaussian
distribution, return an approximate top eigenvector of the covariance matrix.
Our first robust PCA algorithm runs in polynomial time, returns a -approximate top eigenvector, and is based on a
simple iterative filtering approach. Our second, which attains a slightly worse
approximation factor, runs in nearly-linear time and sample complexity under a
mild spectral gap assumption. These are the first polynomial-time algorithms
yielding non-trivial information about the covariance of a corrupted
sub-Gaussian distribution without requiring additional algebraic structure of
moments. As a key technical tool, we develop the first width-independent
solvers for Schatten- norm packing semidefinite programs, giving a -approximate solution in
input-sparsity time iterations
(where , are problem dimensions).Comment: 35 page
Robust Regression Revisited: Acceleration and Improved Estimation Rates
We study fast algorithms for statistical regression problems under the strong
contamination model, where the goal is to approximately optimize a generalized
linear model (GLM) given adversarially corrupted samples. Prior works in this
line of research were based on the robust gradient descent framework of Prasad
et. al., a first-order method using biased gradient queries, or the Sever
framework of Diakonikolas et. al., an iterative outlier-removal method calling
a stationary point finder.
We present nearly-linear time algorithms for robust regression problems with
improved runtime or estimation guarantees compared to the state-of-the-art. For
the general case of smooth GLMs (e.g. logistic regression), we show that the
robust gradient descent framework of Prasad et. al. can be accelerated, and
show our algorithm extends to optimizing the Moreau envelopes of Lipschitz GLMs
(e.g. support vector machines), answering several open questions in the
literature.
For the well-studied case of robust linear regression, we present an
alternative approach obtaining improved estimation rates over prior
nearly-linear time algorithms. Interestingly, our method starts with an
identifiability proof introduced in the context of the sum-of-squares algorithm
of Bakshi and Prasad, which achieved optimal error rates while requiring large
polynomial runtime and sample complexity. We reinterpret their proof within the
Sever framework and obtain a dramatically faster and more sample-efficient
algorithm under fewer distributional assumptions.Comment: 47 page
Solving SDP Faster: A Robust IPM Framework and Efficient Implementation
This paper introduces a new robust interior point method analysis for
semidefinite programming (SDP). This new robust analysis can be combined with
either logarithmic barrier or hybrid barrier.
Under this new framework, we can improve the running time of semidefinite
programming (SDP) with variable size and constraints up to
accuracy.
We show that for the case , we can solve SDPs in
time. This suggests solving SDP is nearly as fast as solving the
linear system with equal number of variables and constraints. This is the first
result that tall dense SDP can be solved in the nearly-optimal running time,
and it also improves the state-of-the-art SDP solver [Jiang, Kathuria, Lee,
Padmanabhan and Song, FOCS 2020].
In addition to our new IPM analysis, we also propose a number of techniques
that might be of further interest, such as, maintaining the inverse of a
Kronecker product using lazy updates, a general amortization scheme for
positive semidefinite matrices
Robust and Differentially Private Mean Estimation
In statistical learning and analysis from shared data, which is increasingly
widely adopted in platforms such as federated learning and meta-learning, there
are two major concerns: privacy and robustness. Each participating individual
should be able to contribute without the fear of leaking one's sensitive
information. At the same time, the system should be robust in the presence of
malicious participants inserting corrupted data. Recent algorithmic advances in
learning from shared data focus on either one of these threats, leaving the
system vulnerable to the other. We bridge this gap for the canonical problem of
estimating the mean from i.i.d. samples. We introduce PRIME, which is the first
efficient algorithm that achieves both privacy and robustness for a wide range
of distributions. We further complement this result with a novel exponential
time algorithm that improves the sample complexity of PRIME, achieving a
near-optimal guarantee and matching a known lower bound for (non-robust)
private mean estimation. This proves that there is no extra statistical cost to
simultaneously guaranteeing privacy and robustness.Comment: 58 pages, 2 figures, both exponential time and efficient algorithms
no longer require a known bound on the true mea