22 research outputs found
Sublinear Time Numerical Linear Algebra for Structured Matrices
We show how to solve a number of problems in numerical linear algebra, such
as least squares regression, -regression for any , low rank
approximation, and kernel regression, in time T(A) \poly(\log(nd)), where for
a given input matrix , is the time needed
to compute for an arbitrary vector . Since T(A)
\leq O(\nnz(A)), where \nnz(A) denotes the number of non-zero entries of
, the time is no worse, up to polylogarithmic factors, as all of the recent
advances for such problems that run in input-sparsity time. However, for many
applications, can be much smaller than \nnz(A), yielding significantly
sublinear time algorithms. For example, in the overconstrained
-approximate polynomial interpolation problem, is a
Vandermonde matrix and ; in this case our running time is
n \cdot \poly(\log n) + \poly(d/\epsilon) and we recover the results of
\cite{avron2013sketching} as a special case. For overconstrained
autoregression, which is a common problem arising in dynamical systems, , and we immediately obtain n \cdot \poly(\log n) +
\poly(d/\epsilon) time. For kernel autoregression, we significantly improve
the running time of prior algorithms for general kernels. For the important
case of autoregression with the polynomial kernel and arbitrary target vector
, we obtain even faster algorithms. Our algorithms show that,
perhaps surprisingly, most of these optimization problems do not require much
more time than that of a polylogarithmic number of matrix-vector
multiplications
Input Sparsity and Hardness for Robust Subspace Approximation
In the subspace approximation problem, we seek a k-dimensional subspace F of
R^d that minimizes the sum of p-th powers of Euclidean distances to a given set
of n points a_1, ..., a_n in R^d, for p >= 1. More generally than minimizing
sum_i dist(a_i,F)^p,we may wish to minimize sum_i M(dist(a_i,F)) for some loss
function M(), for example, M-Estimators, which include the Huber and Tukey loss
functions. Such subspaces provide alternatives to the singular value
decomposition (SVD), which is the p=2 case, finding such an F that minimizes
the sum of squares of distances. For p in [1,2), and for typical M-Estimators,
the minimizing gives a solution that is more robust to outliers than that
provided by the SVD. We give several algorithmic and hardness results for these
robust subspace approximation problems.
We think of the n points as forming an n x d matrix A, and letting nnz(A)
denote the number of non-zero entries of A. Our results hold for p in [1,2). We
use poly(n) to denote n^{O(1)} as n -> infty. We obtain: (1) For minimizing
sum_i dist(a_i,F)^p, we give an algorithm running in O(nnz(A) +
(n+d)poly(k/eps) + exp(poly(k/eps))), (2) we show that the problem of
minimizing sum_i dist(a_i, F)^p is NP-hard, even to output a
(1+1/poly(d))-approximation, answering a question of Kannan and Vempala, and
complementing prior results which held for p >2, (3) For loss functions for a
wide class of M-Estimators, we give a problem-size reduction: for a parameter
K=(log n)^{O(log k)}, our reduction takes O(nnz(A) log n + (n+d) poly(K/eps))
time to reduce the problem to a constrained version involving matrices whose
dimensions are poly(K eps^{-1} log n). We also give bicriteria solutions, (4)
Our techniques lead to the first O(nnz(A) + poly(d/eps)) time algorithms for
(1+eps)-approximate regression for a wide class of convex M-Estimators.Comment: paper appeared in FOCS, 201
Pruning Neural Networks via Coresets and Convex Geometry: Towards No Assumptions
Pruning is one of the predominant approaches for compressing deep neural
networks (DNNs). Lately, coresets (provable data summarizations) were leveraged
for pruning DNNs, adding the advantage of theoretical guarantees on the
trade-off between the compression rate and the approximation error. However,
coresets in this domain were either data-dependent or generated under
restrictive assumptions on both the model's weights and inputs. In real-world
scenarios, such assumptions are rarely satisfied, limiting the applicability of
coresets. To this end, we suggest a novel and robust framework for computing
such coresets under mild assumptions on the model's weights and without any
assumption on the training data. The idea is to compute the importance of each
neuron in each layer with respect to the output of the following layer. This is
achieved by a combination of L\"{o}wner ellipsoid and Caratheodory theorem. Our
method is simultaneously data-independent, applicable to various networks and
datasets (due to the simplified assumptions), and theoretically supported.
Experimental results show that our method outperforms existing coreset based
neural pruning approaches across a wide range of networks and datasets. For
example, our method achieved a compression rate on ResNet50 on ImageNet
with drop in accuracy