27,473 research outputs found
Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies
We explore the trade-offs of performing linear algebra using Apache Spark,
compared to traditional C and MPI implementations on HPC platforms. Spark is
designed for data analytics on cluster computing platforms with access to local
disks and is optimized for data-parallel tasks. We examine three widely-used
and important matrix factorizations: NMF (for physical plausability), PCA (for
its ubiquity) and CX (for data interpretability). We apply these methods to
TB-sized problems in particle physics, climate modeling and bioimaging. The
data matrices are tall-and-skinny which enable the algorithms to map
conveniently into Spark's data-parallel model. We perform scaling experiments
on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide
tuning guidance to obtain high performance
Compressive Sensing for Spectroscopy and Polarimetry
We demonstrate through numerical simulations with real data the feasibility
of using compressive sensing techniques for the acquisition of
spectro-polarimetric data. This allows us to combine the measurement and the
compression process into one consistent framework. Signals are recovered thanks
to a sparse reconstruction scheme from projections of the signal of interest
onto appropriately chosen vectors, typically noise-like vectors. The
compressibility properties of spectral lines are analyzed in detail. The
results shown in this paper demonstrate that, thanks to the compressibility
properties of spectral lines, it is feasible to reconstruct the signals using
only a small fraction of the information that is measured nowadays. We
investigate in depth the quality of the reconstruction as a function of the
amount of data measured and the influence of noise. This change of paradigm
also allows us to define new instrumental strategies and to propose
modifications to existing instruments in order to take advantage of compressive
sensing techniques.Comment: 11 pages, 9 figures, accepted for publication in A&
A Generative-Discriminative Basis Learning Framework to Predict Clinical Severity from Resting State Functional MRI Data
We propose a matrix factorization technique that decomposes the resting state
fMRI (rs-fMRI) correlation matrices for a patient population into a sparse set
of representative subnetworks, as modeled by rank one outer products. The
subnetworks are combined using patient specific non-negative coefficients;
these coefficients are also used to model, and subsequently predict the
clinical severity of a given patient via a linear regression. Our
generative-discriminative framework is able to exploit the structure of rs-fMRI
correlation matrices to capture group level effects, while simultaneously
accounting for patient variability. We employ ten fold cross validation to
demonstrate the predictive power of our model on a cohort of fifty eight
patients diagnosed with Autism Spectrum Disorder. Our method outperforms
classical semi-supervised frameworks, which perform dimensionality reduction on
the correlation features followed by non-linear regression to predict the
clinical scores
- …