Search CORE

192 research outputs found

Unsupervised Bayesian linear unmixing of gene expression microarrays

Author: A Hyvärinen
Aimee K Zaas
AK Zaas
Alfred O Hero III
B Chen
CM Carvalho
CP Robert
Cécile Bazot
D Dueck
DD Lee
EJ Fertig
Geoffrey S Ginsburg
GJ McLachlan
J Baek
J Paisley
Jean-Yves Tourneret
JM Nascimento
KY Yeung
M West
ME Winter
N Dobigeon
N Dobigeon
Nicolas Dobigeon
P Fogel
PJ Green
RO Duda
TD Moloshok
TF Cox
V Nikulin
WR Gilks
Y Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Background: This paper introduces a new constrained model and the corresponding algorithm, called unsupervised Bayesian linear unmixing (uBLU), to identify biological signatures from high dimensional assays like gene expression microarrays. The basis for uBLU is a Bayesian model for the data samples which are represented as an additive mixture of random positive gene signatures, called factors, with random positive mixing coefficients, called factor scores, that specify the relative contribution of each signature to a specific sample. The particularity of the proposed method is that uBLU constrains the factor loadings to be non-negative and the factor scores to be probability distributions over the factors. Furthermore, it also provides estimates of the number of factors. A Gibbs sampling strategy is adopted here to generate random samples according to the posterior distribution of the factors, factor scores, and number of factors. These samples are then used to estimate all the unknown parameters. Results: Firstly, the proposed uBLU method is applied to several simulated datasets with known ground truth and compared with previous factor decomposition methods, such as principal component analysis (PCA), non negative matrix factorization (NMF), Bayesian factor regression modeling (BFRM), and the gradient-based algorithm for general matrix factorization (GB-GMF). Secondly, we illustrate the application of uBLU on a real time-evolving gene expression dataset from a recent viral challenge study in which individuals have been inoculated with influenza A/H3N2/Wisconsin. We show that the uBLU method significantly outperforms the other methods on the simulated and real data sets considered here. Conclusions: The results obtained on synthetic and real data illustrate the accuracy of the proposed uBLU method when compared to other factor decomposition methods from the literature (PCA, NMF, BFRM, and GB-GMF). The uBLU method identifies an inflammatory component closely associated with clinical symptom scores collected during the study. Using a constrained model allows recovery of all the inflammatory genes in a single factor

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Springer - Publisher Connector

Open Archive Toulouse Archive Ouverte

PubMed Central

Deep Blue Documents at the University of Michigan

Simultaneous non-negative matrix factorization for multiple large scale gene expression datasets in toxicology

Author: A Weiss
AD Pascual-Montano
C Dreyer
C Farah
C. Roland Wolf
Clare M. Lee
D Jones
D. R. Haggart
Daniel Crowther
DD Lee
DD Lee
Desmond J. Higham
DR Artis
DW Huang
E Scotti
G Florvall
Gino Miele
H Baumann
H Kim
J Jin
J Oldgren
J Rhee
J. Keith Vass
JP Brunet
K Devarajan
K Matsukuma
L Zhang
M Parmacek
M T
Manikhandan A. V. Mudaliar
P Carmona-Saez
P Fogel
P Gervois
R Gadaleta
Ramin Homayouni
S Kersten
S Monit
S Perry
SA Kliewer
T Kobayashi
Y Gao
Y Guo
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Non-negative matrix factorization is a useful tool for reducing the dimension of large datasets. This work considers simultaneous non-negative matrix factorization of multiple sources of data. In particular, we perform the first study that involves more than two datasets. We discuss the algorithmic issues required to convert the approach into a practical computational tool and apply the technique to new gene expression data quantifying the molecular changes in four tissue types due to different dosages of an experimental panPPAR agonist in mouse. This study is of interest in toxicology because, whilst PPARs form potential therapeutic targets for diabetes, it is known that they can induce serious side-effects. Our results show that the practical simultaneous non-negative matrix factorization developed here can add value to the data analysis. In particular, we find that factorizing the data as a single object allows us to distinguish between the four tissue types, but does not correctly reproduce the known dosage level groups. Applying our new approach, which treats the four tissue types as providing distinct, but related, datasets, we find that the dosage level groups are respected. The new algorithm then provides separate gene list orderings that can be studied for each tissue type, and compared with the ordering arising from the single factorization. We find that many of our conclusions can be corroborated with known biological behaviour, and others offer new insights into the toxicological effects. Overall, the algorithm shows promise for early detection of toxicity in the drug discovery process

Public Library of Science (PLOS)

CiteSeerX

Crossref

University of Strathclyde Institutional Repository

Directory of Open Access Journals

PubMed Central

Enlighten

University of Dundee Online Publications

DNA meets the SVD

Author: Grindrod Peter
Higham Desmond J.
Kalna Gabriela
Spence Alistair
Stoyanov Zhivko
Vass J. Keith
Publication venue
Publication date: 01/01/2008
Field of study

This paper introduces an important area of computational cell biology where complex, publicly available genomic data is being examined by linear algebra methods, with the aim of revealing biological and medical insights

University of Strathclyde Institutional Repository

Edinburgh Research Explorer

Non-Negative Matrix Factorization for the Analysis of Complex Gene Expression Data: Identification of Clinically Relevant Tumor Subtypes

Author: Frigyesi Attila
Höglund Mattias
Publication venue: Libertas Academica
Publication date: 01/01/2008
Field of study

Non-negative matrix factorization (NMF) is a relatively new approach to analyze gene expression data that models data by additive combinations of non-negative basis vectors (metagenes). The non-negativity constraint makes sense biologically as genes may either be expressed or not, but never show negative expression. We applied NMF to five different microarray data sets. We estimated the appropriate number metagens by comparing the residual error of NMF reconstruction of data to that of NMF reconstruction of permutated data, thus finding when a given solution contained more information than noise. This analysis also revealed that NMF could not factorize one of the data sets in a meaningful way. We used GO categories and pre defined gene sets to evaluate the biological significance of the obtained metagenes. By analyses of metagenes specific for the same GO-categories we could show that individual metagenes activated different aspects of the same biological processes. Several of the obtained metagenes correlated with tumor subtypes and tumors with characteristic chromosomal translocations, indicating that metagenes may correspond to specific disease entities. Hence, NMF extracts biological relevant structures of microarray expression data and may thus contribute to a deeper understanding of tumor behavior

Lund University Publications

Directory of Open Access Journals

PubMed Central

NMFP: a non-negative matrix factorization based preselection method to increase accuracy of identifying mRNA isoforms from RNA-seq data

Author: A Pascual-montano
A Pascual-montano
A Roberts
C Li
C Trapnell
D Risso
DD Lee
G-S Wang
H Thorvaldsdóttir
J Fan
J Li
J-P Brunet
Jingyi Jessica Li
JT Robinson
KD Hansen
M Guttman
MA Anton
Ma Anton
NA Faustino
P Fogel
R Tibshirani
T Griebel
T Steijger
W Li
Yuting Ye
Z Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A primer on correlation-based dimension reduction methods for multi-omics analysis

Author: Angelopoulos Nicos
Downing Tim
Publication venue
Publication date: 27/05/2023
Field of study

The continuing advances of omic technologies mean that it is now more tangible to measure the numerous features collectively reflecting the molecular properties of a sample. When multiple omic methods are used, statistical and computational approaches can exploit these large, connected profiles. Multi-omics is the integration of different omic data sources from the same biological sample. In this review, we focus on correlation-based dimension reduction approaches for single omic datasets, followed by methods for pairs of omics datasets, before detailing further techniques for three or more omic datasets. We also briefly detail network methods when three or more omic datasets are available and which complement correlation-oriented tools. To aid readers new to this area, these are all linked to relevant R packages that can implement these procedures. Finally, we discuss scenarios of experimental design and present road maps that simplify the selection of appropriate analysis methods. This review will guide researchers navigate the emerging methods for multi-omics and help them integrate diverse omic datasets appropriately and embrace the opportunity of population multi-omics.Comment: 30 pages, 2 figures, 6 table

arXiv.org e-Print Archive

Exact Dimensionality Selection for Bayesian PCA

Author: Bouveyron Charles
Latouche Pierre
Mattei Pierre-Alexandre
Publication venue
Publication date: 21/05/2019
Field of study

We present a Bayesian model selection approach to estimate the intrinsic dimensionality of a high-dimensional dataset. To this end, we introduce a novel formulation of the probabilisitic principal component analysis model based on a normal-gamma prior distribution. In this context, we exhibit a closed-form expression of the marginal likelihood which allows to infer an optimal number of components. We also propose a heuristic based on the expected shape of the marginal likelihood curve in order to choose the hyperparameters. In non-asymptotic frameworks, we show on simulated data that this exact dimensionality selection approach is competitive with both Bayesian and frequentist state-of-the-art methods

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes