Search CORE

27,473 research outputs found

Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

Author: Canon Shane
Chhugani Jatin
Demmel James
Devarakonda Aditya
Gerhardt Lisa
Gittens Alex
Harrell Jim
Kottalam Jey
Krishnamurthy Venkat
Liu Jialin
Mahoney Michael W.
Maschhoff Kristyn
Prabhat
Racah Evan
Ringenburg Michael
Sharma Pramod
Yang Jiyan
Publication venue
Publication date: 12/05/2016
Field of study

We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance

arXiv.org e-Print Archive

eScholarship - University of California

Compressive Sensing for Spectroscopy and Polarimetry

Author: Ariste A. Lopez
Ramos A. Asensio
Publication venue: 'EDP Sciences'
Publication date: 24/09/2009
Field of study

We demonstrate through numerical simulations with real data the feasibility of using compressive sensing techniques for the acquisition of spectro-polarimetric data. This allows us to combine the measurement and the compression process into one consistent framework. Signals are recovered thanks to a sparse reconstruction scheme from projections of the signal of interest onto appropriately chosen vectors, typically noise-like vectors. The compressibility properties of spectral lines are analyzed in detail. The results shown in this paper demonstrate that, thanks to the compressibility properties of spectral lines, it is feasible to reconstruct the signals using only a small fraction of the information that is measured nowadays. We investigate in depth the quality of the reconstruction as a function of the amount of data measured and the influence of noise. This change of paradigm also allows us to define new instrumental strategies and to propose modifications to existing instruments in order to take advantage of compressive sensing techniques.Comment: 11 pages, 9 figures, accepted for publication in A&

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

A Generative-Discriminative Basis Learning Framework to Predict Clinical Severity from Resting State Functional MRI Data

Author: A Venkataraman
D Sridharan
H Eavani
KP Murphy
MB Nebel
MD Fox
N Parikh
N Payakachat
NK Batmanghelich
Publication venue
Publication date: 24/07/2018
Field of study

We propose a matrix factorization technique that decomposes the resting state fMRI (rs-fMRI) correlation matrices for a patient population into a sparse set of representative subnetworks, as modeled by rank one outer products. The subnetworks are combined using patient specific non-negative coefficients; these coefficients are also used to model, and subsequently predict the clinical severity of a given patient via a linear regression. Our generative-discriminative framework is able to exploit the structure of rs-fMRI correlation matrices to capture group level effects, while simultaneously accounting for patient variability. We employ ten fold cross validation to demonstrate the predictive power of our model on a cohort of fifty eight patients diagnosed with Autism Spectrum Disorder. Our method outperforms classical semi-supervised frameworks, which perform dimensionality reduction on the correlation features followed by non-linear regression to predict the clinical scores

arXiv.org e-Print Archive

Crossref