    Space-Efficient Interior Point Method, with Applications to Linear Programming and Maximum Weight Bipartite Matching

    L1 Regression with Lewis Weights Subsampling

    Sublinear Time Numerical Linear Algebra for Structured Matrices

    We show how to solve a number of problems in numerical linear algebra, such as least squares regression, p\ell_p-regression for any p1p \geq 1, low rank approximation, and kernel regression, in time T(A) \poly(\log(nd)), where for a given input matrix ARn×dA \in \mathbb{R}^{n \times d}, T(A)T(A) is the time needed to compute AyA\cdot y for an arbitrary vector yRdy \in \mathbb{R}^d. Since T(A) \leq O(\nnz(A)), where \nnz(A) denotes the number of non-zero entries of AA, the time is no worse, up to polylogarithmic factors, as all of the recent advances for such problems that run in input-sparsity time. However, for many applications, T(A)T(A) can be much smaller than \nnz(A), yielding significantly sublinear time algorithms. For example, in the overconstrained (1+ϵ)(1+\epsilon)-approximate polynomial interpolation problem, AA is a Vandermonde matrix and T(A)=O(nlogn)T(A) = O(n \log n); in this case our running time is n \cdot \poly(\log n) + \poly(d/\epsilon) and we recover the results of \cite{avron2013sketching} as a special case. For overconstrained autoregression, which is a common problem arising in dynamical systems, T(A)=O(nlogn)T(A) = O(n \log n), and we immediately obtain n \cdot \poly(\log n) + \poly(d/\epsilon) time. For kernel autoregression, we significantly improve the running time of prior algorithms for general kernels. For the important case of autoregression with the polynomial kernel and arbitrary target vector bRnb\in\mathbb{R}^n, we obtain even faster algorithms. Our algorithms show that, perhaps surprisingly, most of these optimization problems do not require much more time than that of a polylogarithmic number of matrix-vector multiplications

    Input Sparsity and Hardness for Robust Subspace Approximation

    In the subspace approximation problem, we seek a k-dimensional subspace F of R^d that minimizes the sum of p-th powers of Euclidean distances to a given set of n points a_1, ..., a_n in R^d, for p >= 1. More generally than minimizing sum_i dist(a_i,F)^p,we may wish to minimize sum_i M(dist(a_i,F)) for some loss function M(), for example, M-Estimators, which include the Huber and Tukey loss functions. Such subspaces provide alternatives to the singular value decomposition (SVD), which is the p=2 case, finding such an F that minimizes the sum of squares of distances. For p in [1,2), and for typical M-Estimators, the minimizing FF gives a solution that is more robust to outliers than that provided by the SVD. We give several algorithmic and hardness results for these robust subspace approximation problems. We think of the n points as forming an n x d matrix A, and letting nnz(A) denote the number of non-zero entries of A. Our results hold for p in [1,2). We use poly(n) to denote n^{O(1)} as n -> infty. We obtain: (1) For minimizing sum_i dist(a_i,F)^p, we give an algorithm running in O(nnz(A) + (n+d)poly(k/eps) + exp(poly(k/eps))), (2) we show that the problem of minimizing sum_i dist(a_i, F)^p is NP-hard, even to output a (1+1/poly(d))-approximation, answering a question of Kannan and Vempala, and complementing prior results which held for p >2, (3) For loss functions for a wide class of M-Estimators, we give a problem-size reduction: for a parameter K=(log n)^{O(log k)}, our reduction takes O(nnz(A) log n + (n+d) poly(K/eps)) time to reduce the problem to a constrained version involving matrices whose dimensions are poly(K eps^{-1} log n). We also give bicriteria solutions, (4) Our techniques lead to the first O(nnz(A) + poly(d/eps)) time algorithms for (1+eps)-approximate regression for a wide class of convex M-Estimators.Comment: paper appeared in FOCS, 201

    Pruning Neural Networks via Coresets and Convex Geometry: Towards No Assumptions

    Full text link
    Pruning is one of the predominant approaches for compressing deep neural networks (DNNs). Lately, coresets (provable data summarizations) were leveraged for pruning DNNs, adding the advantage of theoretical guarantees on the trade-off between the compression rate and the approximation error. However, coresets in this domain were either data-dependent or generated under restrictive assumptions on both the model's weights and inputs. In real-world scenarios, such assumptions are rarely satisfied, limiting the applicability of coresets. To this end, we suggest a novel and robust framework for computing such coresets under mild assumptions on the model's weights and without any assumption on the training data. The idea is to compute the importance of each neuron in each layer with respect to the output of the following layer. This is achieved by a combination of L\"{o}wner ellipsoid and Caratheodory theorem. Our method is simultaneously data-independent, applicable to various networks and datasets (due to the simplified assumptions), and theoretically supported. Experimental results show that our method outperforms existing coreset based neural pruning approaches across a wide range of networks and datasets. For example, our method achieved a 62%62\% compression rate on ResNet50 on ImageNet with 1.09%1.09\% drop in accuracy