Search CORE

1,878 research outputs found

Improved Practical Matrix Sketching with Guarantees

Author: Desai Amey
Ghashami Mina
Phillips Jeff M.
Publication venue
Publication date: 01/01/2014
Field of study

Matrices have become essential data representations for many large-scale problems in data analytics, and hence matrix sketching is a critical task. Although much research has focused on improving the error/size tradeoff under various sketching paradigms, the many forms of error bounds make these approaches hard to compare in theory and in practice. This paper attempts to categorize and compare most known methods under row-wise streaming updates with provable guarantees, and then to tweak some of these methods to gain practical improvements while retaining guarantees. For instance, we observe that a simple heuristic iSVD, with no guarantees, tends to outperform all known approaches in terms of size/error trade-off. We modify the best performing method with guarantees FrequentDirections under the size/error trade-off to match the performance of iSVD and retain its guarantees. We also demonstrate some adversarial datasets where iSVD performs quite poorly. In comparing techniques in the time/error trade-off, techniques based on hashing or sampling tend to perform better. In this setting we modify the most studied sampling regime to retain error guarantee but obtain dramatic improvements in the time/error trade-off. Finally, we provide easy replication of our studies on APT, a new testbed which makes available not only code and datasets, but also a computing platform with fixed environmental settings.Comment: 27 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

Fast linear algebra is stable

Author: A. Borodin
A. Edelman
A. Schönhage
A.N. Malyshev
A.Ya. Bulgakov
C. Bischof
D. Bini
D. Coppersmith
D. Heller
E. Elmroth
G. Golub
G.W. Stewart
G.W. Stewart
Ioana Dumitriu
J. Demmel
J. Demmel
J. Demmel
J. Roberts
J. Varah
James Demmel
M. Gu
N. Higham
N.J. Higham
Olga Holtz
P. Bürgisser
P. Hong
R. Cormen
R. Schreiber
R.J. Muirhead
S. Chandrasekaran
S. Huss
S. Toledo
S.K. Godunov
T. Chan
T.W. Anderson
V. Strassen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

In an earlier paper, we showed that a large class of fast recursive matrix multiplication algorithms is stable in a normwise sense, and that in fact if multiplication of

n

-by-

n

matrices can be done by any algorithm in

O(n^{\omega + \eta})

operations for any

\eta > 0

, then it can be done stably in

O(n^{\omega + \eta})

operations for any

\eta > 0

. Here we extend this result to show that essentially all standard linear algebra operations, including LU decomposition, QR decomposition, linear equation solving, matrix inversion, solving least squares problems, (generalized) eigenvalue problems and the singular value decomposition can also be done stably (in a normwise sense) in

O(n^{\omega + \eta})

operations.Comment: 26 pages; final version; to appear in Numerische Mathemati

arXiv.org e-Print Archive

CiteSeerX

Crossref

eScholarship - University of California

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

Author: Cohen Michael B.
Elder Sam
Musco Cameron
Musco Christopher
Persu Madalina
Publication venue
Publication date: 02/04/2015
Field of study

We show how to approximate a data matrix

\mathbf{A}

with a much smaller sketch

\mathbf{\tilde A}

that can be used to solve a general class of constrained k-rank approximation problems to within

(1+\epsilon)

error. Importantly, this class of problems includes

k

-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just

O(k)

dimensions, our methods generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For

k

-means dimensionality reduction, we provide

(1+\epsilon)

relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only `cover' a good subspace for \bv{A}, but can be used directly to compute this subspace. Finally, for

k

-means clustering, we show how to achieve a

(9+\epsilon)

approximation by Johnson-Lindenstrauss projecting data points to just

O(\log k/\epsilon^2)

dimensions. This gives the first result that leverages the specific structure of

k

-means to achieve dimension independent of input size and sublinear in

k

arXiv.org e-Print Archive

CiteSeerX

Randomized Algorithms for Computation of Tucker decomposition and Higher Order SVD (HOSVD)

Author: Abukhovich Stanislav
Ahmadi-Asl Salman
Asante-Mensah Maame G.
Cichocki Andrzej
Oseledets Ivan
Phan Anh Huy
Tanaka Toshihisa
Publication venue
Publication date: 01/01/2021
Field of study

Big data analysis has become a crucial part of new emerging technologies such as the internet of things, cyber-physical analysis, deep learning, anomaly detection, etc. Among many other techniques, dimensionality reduction plays a key role in such analyses and facilitates feature selection and feature extraction. Randomized algorithms are efficient tools for handling big data tensors. They accelerate decomposing large-scale data tensors by reducing the computational complexity of deterministic algorithms and the communication among different levels of the memory hierarchy, which is the main bottleneck in modern computing environments and architectures. In this paper, we review recent advances in randomization for the computation of Tucker decomposition and Higher Order SVD (HOSVD). We discuss random projection and sampling approaches, single-pass, and multi-pass randomized algorithms, and how to utilize them in the computation of the Tucker decomposition and the HOSVD. Simulations on synthetic and real datasets are provided to compare the performance of some of the best and most promising algorithms

arXiv.org e-Print Archive

Directory of Open Access Journals