Search CORE

87,026 research outputs found

A flexible framework for sparse simultaneous component based data integration

Author: AE Hoerl
AL Barabasi
Anestis Antoniadis
D Lee
DM Witten
GJ McLachlan
H Kiers
H Zou
H Zou
HAL Kiers
I Borg
I Jolliffe
IT Jolliffe
Iven Van Mechelen
J de Leeuw
J Friedman
J Huang
JMF Ten Berge
K Lange
K Lemmens
K Van Deun
K Van Deun
K Van Deun
KA Le Cao
Katrijn Van Deun
KR Gabriel
L Meier
M de Tayrac
M Kowalski
M Yuan
MJ van der Werf
N Ishii
O Alter
P Zhao
PJF Groenen
R Jenatton
R Tibshirani
R van den Berg
Robert A van den Berg
S Hochreiter
S Ma
T Wilderjans
TF Wilderjans
Tom F Wilderjans
WJ Heiser
Y Kim
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract 1 Background High throughput data are complex and methods that reveal structure underlying the data are most useful. Principal component analysis, frequently implemented as a singular value decomposition, is a popular technique in this respect. Nowadays often the challenge is to reveal structure in several sources of information (e.g., transcriptomics, proteomics) that are available for the same biological entities under study. Simultaneous component methods are most promising in this respect. However, the interpretation of the principal and simultaneous components is often daunting because contributions of each of the biomolecules (transcripts, proteins) have to be taken into account. 2 Results We propose a sparse simultaneous component method that makes many of the parameters redundant by shrinking them to zero. It includes principal component analysis, sparse principal component analysis, and ordinary simultaneous component analysis as special cases. Several penalties can be tuned that account in different ways for the block structure present in the integrated data. This yields known sparse approaches as the lasso, the ridge penalty, the elastic net, the group lasso, sparse group lasso, and elitist lasso. In addition, the algorithmic results can be easily transposed to the context of regression. Metabolomics data obtained with two measurement platforms for the same set of <it>Escherichia coli </it>samples are used to illustrate the proposed methodology and the properties of different penalties with respect to sparseness across and within data blocks. 3 Conclusion Sparse simultaneous component analysis is a useful method for data integration: First, simultaneous analyses of multiple blocks offer advantages over sequential and separate analyses and second, interpretation of the results is highly facilitated by their sparseness. The approach offered is flexible and allows to take the block structure in different ways into account. As such, structures can be found that are exclusively tied to one data platform (group lasso approach) as well as structures that involve all data platforms (Elitist lasso approach). 4 Availability The additional file contains a MATLAB implementation of the sparse simultaneous component method.</p

Lirias

Crossref

Hal - Université Grenoble Alpes

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Info-Greedy sequential adaptive compressed sensing

Author: Braun Gabor
Pokutta Sebastian
Xie Yao
Publication venue
Publication date: 02/02/2015
Field of study

We present an information-theoretic framework for sequential adaptive compressed sensing, Info-Greedy Sensing, where measurements are chosen to maximize the extracted information conditioned on the previous measurements. We show that the widely used bisection approach is Info-Greedy for a family of

k

-sparse signals by connecting compressed sensing and blackbox complexity of sequential query algorithms, and present Info-Greedy algorithms for Gaussian and Gaussian Mixture Model (GMM) signals, as well as ways to design sparse Info-Greedy measurements. Numerical examples demonstrate the good performance of the proposed algorithms using simulated and real data: Info-Greedy Sensing shows significant improvement over random projection for signals with sparse and low-rank covariance matrices, and adaptivity brings robustness when there is a mismatch between the assumed and the true distributions.Comment: Preliminary results presented at Allerton Conference 2014. To appear in IEEE Journal Selected Topics on Signal Processin

arXiv.org e-Print Archive

CiteSeerX

Parallel Algorithms for Summing Floating-Point Numbers

Author: Eldawy Ahmed
Goodrich Michael T.
Publication venue
Publication date: 17/05/2016
Field of study

The problem of exactly summing n floating-point numbers is a fundamental problem that has many applications in large-scale simulations and computational geometry. Unfortunately, due to the round-off error in standard floating-point operations, this problem becomes very challenging. Moreover, all existing solutions rely on sequential algorithms which cannot scale to the huge datasets that need to be processed. In this paper, we provide several efficient parallel algorithms for summing n floating point numbers, so as to produce a faithfully rounded floating-point representation of the sum. We present algorithms in PRAM, external-memory, and MapReduce models, and we also provide an experimental analysis of our MapReduce algorithms, due to their simplicity and practical efficiency.Comment: Conference version appears in SPAA 201

arXiv.org e-Print Archive

eScholarship - University of California

Sequential Gaussian Processes for Online Learning of Nonstationary Functions

Author: Dumitrascu Bianca
Engelhardt Barbara E.
Williamson Sinead A.
Zhang Michael Minyi
Publication venue
Publication date: 16/10/2019
Field of study

Many machine learning problems can be framed in the context of estimating functions, and often these are time-dependent functions that are estimated in real-time as observations arrive. Gaussian processes (GPs) are an attractive choice for modeling real-valued nonlinear functions due to their flexibility and uncertainty quantification. However, the typical GP regression model suffers from several drawbacks: i) Conventional GP inference scales

O(N^{3})

with respect to the number of observations; ii) updating a GP model sequentially is not trivial; and iii) covariance kernels often enforce stationarity constraints on the function, while GPs with non-stationary covariance kernels are often intractable to use in practice. To overcome these issues, we propose an online sequential Monte Carlo algorithm to fit mixtures of GPs that capture non-stationary behavior while allowing for fast, distributed inference. By formulating hyperparameter optimization as a multi-armed bandit problem, we accelerate mixing for real time inference. Our approach empirically improves performance over state-of-the-art methods for online GP estimation in the context of prediction for simulated non-stationary data and hospital time series data

arXiv.org e-Print Archive

Block Coordinate Descent for Sparse NMF

Author: Calhoun Vince D.
Hayes Thomas P.
Pearlmutter Barak A.
Plis Sergey M.
Potluru Vamsi K.
Roux Jonathan Le
Publication venue
Publication date: 01/01/2013
Field of study

Nonnegative matrix factorization (NMF) has become a ubiquitous tool for data analysis. An important variant is the sparse NMF problem which arises when we explicitly require the learnt features to be sparse. A natural measure of sparsity is the L

_0

norm, however its optimization is NP-hard. Mixed norms, such as L

_1

_2

measure, have been shown to model sparsity robustly, based on intuitive attributes that such measures need to satisfy. This is in contrast to computationally cheaper alternatives such as the plain L

_1

norm. However, present algorithms designed for optimizing the mixed norm L

_1

_2

are slow and other formulations for sparse NMF have been proposed such as those based on L

_1

and L

_0

norms. Our proposed algorithm allows us to solve the mixed norm sparsity constraints while not sacrificing computation time. We present experimental evidence on real-world datasets that shows our new algorithm performs an order of magnitude faster compared to the current state-of-the-art solvers optimizing the mixed norm and is suitable for large-scale datasets

arXiv.org e-Print Archive

CiteSeerX

MURAL - Maynooth University Research Archive Library

NUI Maynooth Eprint Archive

Maynooth University ePrints and eTheses Archive