Search CORE

4,357 research outputs found

Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

Author: Canon Shane
Chhugani Jatin
Demmel James
Devarakonda Aditya
Gerhardt Lisa
Gittens Alex
Harrell Jim
Kottalam Jey
Krishnamurthy Venkat
Liu Jialin
Mahoney Michael W.
Maschhoff Kristyn
Prabhat
Racah Evan
Ringenburg Michael
Sharma Pramod
Yang Jiyan
Publication venue
Publication date: 12/05/2016
Field of study

We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance

arXiv.org e-Print Archive

eScholarship - University of California

Online Tensor Methods for Learning Latent Variable Models

Author: Anandkumar Animashree
Hakeem Mohammad Umar
Huang Furong
Niranjan U. N.
Publication venue
Publication date: 01/01/2015
Field of study

We introduce an online tensor decomposition based approach for two latent variable modeling problems namely, (1) community detection, in which we learn the latent communities that the social actors in social networks belong to, and (2) topic modeling, in which we infer hidden topics of text articles. We consider decomposition of moment tensors using stochastic gradient descent. We conduct optimization of multilinear operations in SGD and avoid directly forming the tensors, to save computational and storage costs. We present optimized algorithm in two platforms. Our GPU-based implementation exploits the parallelism of SIMD architectures to allow for maximum speed-up by a careful optimization of storage and data transfer, whereas our CPU-based implementation uses efficient sparse matrix computations and is suitable for large sparse datasets. For the community detection problem, we demonstrate accuracy and computational efficiency on Facebook, Yelp and DBLP datasets, and for the topic modeling problem, we also demonstrate good performance on the New York Times dataset. We compare our results to the state-of-the-art algorithms such as the variational method, and report a gain of accuracy and a gain of several orders of magnitude in the execution time.Comment: JMLR 201

arXiv.org e-Print Archive

eScholarship - University of California