Search CORE

3 research outputs found

Total Least Squares Regression in Input Sparsity Time

Author: Diao Huaian
Song Zhao
Woodruff David P.
Yang Xin
Publication venue
Publication date: 26/09/2019
Field of study

In the total least squares problem, one is given an

m \times n

matrix

A

, and an

m \times d

matrix

B

, and one seeks to "correct" both

A

and

B

, obtaining matrices

\hat{A}

and

\hat{B}

, so that there exists an

X

satisfying the equation

\hat{A}X = \hat{B}

. Typically the problem is overconstrained, meaning that

m \gg \max(n,d)

. The cost of the solution

\hat{A}, \hat{B}

is given by

\|A-\hat{A}\|_F^2 + \|B - \hat{B}\|_F^2

. We give an algorithm for finding a solution

X

to the linear system

\hat{A}X=\hat{B}

for which the cost

\|A-\hat{A}\|_F^2 + \|B-\hat{B}\|_F^2

is at most a multiplicative

(1+\epsilon)

factor times the optimal cost, up to an additive error

\eta

that may be an arbitrarily small function of

n

. Importantly, our running time is

\tilde{O}( \mathrm{nnz}(A) + \mathrm{nnz}(B) ) + \mathrm{poly}(n/\epsilon) \cdot d

, where for a matrix

C

\mathrm{nnz}(C)

denotes its number of non-zero entries. Importantly, our running time does not directly depend on the large parameter

m

. As total least squares regression is known to be solvable via low rank approximation, a natural approach is to invoke fast algorithms for approximate low rank approximation, obtaining matrices

\hat{A}

and

\hat{B}

from this low rank approximation, and then solving for

X

so that

\hat{A}X = \hat{B}

. However, existing algorithms do not apply since in total least squares the rank of the low rank approximation needs to be

n

, and so the running time of known methods would be at least

mn^2

. In contrast, we are able to achieve a much faster running time for finding

X

by never explicitly forming the equation

\hat{A} X = \hat{B}

, but instead solving for an

X

which is a solution to an implicit such equation. Finally, we generalize our algorithm to the total least squares problem with regularization

arXiv.org e-Print Archive

A near-optimal algorithm for approximating the John Ellipsoid

Author: Cohen Michael B.
Cousins Ben
Lee Yin Tat
Yang Xin
Publication venue
Publication date: 18/02/2020
Field of study

We develop a simple and efficient algorithm for approximating the John Ellipsoid of a symmetric polytope. Our algorithm is near optimal in the sense that our time complexity matches the current best verification algorithm. We also provide the MATLAB code for further research.Comment: COLT 201

arXiv.org e-Print Archive

Sketching Transformed Matrices with Applications to Natural Language Processing

Author: Liang Yingyu
Song Zhao
Wang Mengdi
Yang Lin F.
Yang Xin
Publication venue
Publication date: 22/02/2020
Field of study

Suppose we are given a large matrix

A=(a_{i,j})

that cannot be stored in memory but is in a disk or is presented in a data stream. However, we need to compute a matrix decomposition of the entry-wisely transformed matrix,

f(A):=(f(a_{i,j}))

for some function

f

. Is it possible to do it in a space efficient way? Many machine learning applications indeed need to deal with such large transformed matrices, for example word embedding method in NLP needs to work with the pointwise mutual information (PMI) matrix, while the entrywise transformation makes it difficult to apply known linear algebraic tools. Existing approaches for this problem either need to store the whole matrix and perform the entry-wise transformation afterwards, which is space consuming or infeasible, or need to redesign the learning method, which is application specific and requires substantial remodeling. In this paper, we first propose a space-efficient sketching algorithm for computing the product of a given small matrix with the transformed matrix. It works for a general family of transformations with provable small error bounds and thus can be used as a primitive in downstream learning tasks. We then apply this primitive to a concrete application: low-rank approximation. We show that our approach obtains small error and is efficient in both space and time. We complement our theoretical results with experiments on synthetic and real data.Comment: AISTATS 202

arXiv.org e-Print Archive