Search CORE

609 research outputs found

Discovering transcriptional modules by Bayesian data integration

Author: Antoniak
Bar-Joseph
Bernard J. de la Cruz
Bähler
Cho
Dahl
Datta
David L. Wild
Eisen
Falcon
Ferguson
Fritsch
Gasch
Gerber
Geweke
Harbison
Ideker
Ihmels
Jim E. Griffin
Kundaje
Lee
Liu
Liu
Medvedovic
Medvedovic
Qin
Rasmussen
Rasmussen
Reid
Richard S. Savage
Savage
Segal
Segal
Teh
Teh
Wild
Yao
Yeung
Zoubin Ghahramani
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study

Motivation: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. Results: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs

Crossref

PubMed Central

Warwick Research Archives Portal Repository

Kent Academic Repository

CUED - Cambridge University Engineering Department

Bayesian correlated clustering to integrate multiple datasets

Author: Balasubramanian
Barash
Brock
Carlson
Cheng
Cherry
Cho
Cooke
Datta
David L. Wild
Dempster
Friedman
Fritsch
Granovskaia
Green
Harbison
Hubert
Huttenhower
Ideker
Ishwaran
Jackson
Jackson
Jansen
Jim E. Griffin
Kirk
Lee
Liu
Liu
Lockhart
Mistry
Myers
Myers
Neal
Neal
Nieto-Barajas
Paul Kirk
Puig
Rand
Rasmussen
Rasmussen
Reiss
Rhodes
Richard S. Savage
Rigaut
Rogers
Rogers
Rousseau
Santisteban
Savage
Schena
Shen
Solomon
Stark
Suchard
Troyanskaya
Wei
Wong
Yeung
Yuan
Zoubin Ghahramani
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct – but often complementary – information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured via parameters that describe the agreement among the datasets. Results: Using a set of 6 artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real S. cerevisiae datasets. In the 2-dataset case, we show that MDI’s performance is comparable to the present state of the art. We then move beyond the capabilities of current approaches and integrate gene expression, ChIP-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques – as well as to non-integrative approaches – demonstrate that MDI is very competitive, while also providing information that would be difficult or impossible to extract using other methods

CiteSeerX

Crossref

PubMed Central

Warwick Research Archives Portal Repository

Kent Academic Repository

Detection of regulator genes and eQTLs in gene networks

Author: A Butte
A Chatr-Aryamontri
A Clauset
A Joshi
A Joshi
A Kundaje
AA Shabalin
AJ Enright
AJ Walhout
AS Dimas
B Schwanhausser
B Zhang
B Zhang
C Cenik
CO Daub
D Koller
DA Cusanovich
DM Greenawalt
E Bonnet
E Ravasz
E Segal
EC Neto
EC Neto
EC Neto
EE Schadt
EE Schadt
EE Schadt
EE Schadt
EE Schadt
EJ Foss
F Grubert
F Yue
FA Cubillos
FW Albert
G Hemani
G Nicholson
GD Smith
GH Golub
H Foroughi Asl
H Talukdar
HN Kadarmideen
J Millstein
J Qi
J Zhu
J Zhu
J Zhu
JE Aten
JF Ayroles
JJ Faith
JL Björkegren
JS Liu
K Basso
K Qu
KG Ardlie
L Wu
LA Hindorff
LH Hartwell
LS Chen
M Ashburner
M Civelek
M Georges
M Gerstein
M Medvedovic
M Schmidt
M Scutari
MA Schaub
MB Eisen
MD Ritchie
ME Goddard
MEJ Newman
MEJ Newman
MV Rockman
MV Rockman
N Friedman
N Friedman
N Friedman
N Laird
O Stegle
P Langfelder
P Langfelder
P Langfelder
P Lu
R Sharan
R Sharan
RB Brem
RW Williams
S Lee
S Roy
S Tavazoie
SI Lee
SM Waszak
SS Rao
T Lappalainen
T Michoel
TA Manolio
TF Mackay
The ENCODE
TS Furey
VG Cheung
W Cookson
W Zhang
Y Chen
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2016
Field of study

Genetic differences between individuals associated to quantitative phenotypic traits, including disease states, are usually found in non-coding genomic regions. These genetic variants are often also associated to differences in expression levels of nearby genes (they are "expression quantitative trait loci" or eQTLs for short) and presumably play a gene regulatory role, affecting the status of molecular networks of interacting genes, proteins and metabolites. Computational systems biology approaches to reconstruct causal gene networks from large-scale omics data have therefore become essential to understand the structure of networks controlled by eQTLs together with other regulatory genes, and to generate detailed hypotheses about the molecular mechanisms that lead from genotype to phenotype. Here we review the main analytical methods and softwares to identify eQTLs and their associated genes, to reconstruct co-expression networks and modules, to reconstruct causal Bayesian gene and module networks, and to validate predicted networks in silico.Comment: minor revision with typos corrected; review article; 24 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Recommended from our members

Statistical methods for multi-omic data integration

Author: Cabassi Alessandra
Publication venue: University of Cambridge
Publication date: 30/09/2020
Field of study

The thesis is focused on the development of new ways to integrate multiple ’omic datasets in the context of precision medicine. This type of analyses have the potential to help researchers deepen their understanding of biological mechanisms underlying disease. However, integrative studies pose several challenges, due to the typically widely differing characteristics of the ’omic layers in terms of number of predictors, type of data, and level of noise. In this work, we first tackle the problem of performing variable selection and building supervised models, while integrating multiple ’omic datasets of different type. It has been recently shown that applying classical logistic regression with elastic-net penalty to these datasets can lead to poor results. Therefore, we suggest a two-step approach to multi-omic logistic regression in which variable selection is performed on each layer separately and a predictive model is subsequently built on the ensemble of the selected variables. In the unsupervised setting, we first examine cluster of clusters analysis (COCA), an integrative clustering approach that combines information from multiple data sources. COCA has been widely applied in the context of tumour subtyping, but its properties have never been systematically explored before, and its robustness to the inclusion of noisy datasets is unclear. Then, we propose a new statistical method for the unsupervised integration of multi-omic data, called kernel learning integrative clustering (KLIC). This approach is based on the idea to frame the challenge of combining clustering structures as a multiple kernel learning problem, in which different datasets each provide a weighted contribution to the final clustering. Finally, we build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian mixture models. A key contribution of our work is the observation that PSMs can be used to define probabilistically-motivated kernel matrices that capture the clustering structure present in the data. This observation enables us to employ a range of kernel methods to obtain summary clusterings, and, if we have multiple PSMs, use standard methods for combining kernels in order to perform integrative clustering. We also show that one can embed PSMs within predictive kernel models in order to perform outcome-guided clustering

Apollo (Cambridge)

Transcriptional programs: modelling higher order structure in transcriptional control.

Author: Ott Sascha
Reid John E
Wernisch Lorenz
Publication venue: BMC Bioinformatics
Publication date: 01/07/2009
Field of study

BACKGROUND: Transcriptional regulation is an important part of regulatory control in eukaryotes. Even if binding motifs for transcription factors are known, the task of finding binding sites by scanning sequences is plagued by false positives. One way to improve the detection of binding sites from motifs is by taking cooperativity of transcription factor binding into account. We propose a non-parametric probabilistic model, similar to a document topic model, for detecting transcriptional programs, groups of cooperative transcription factors and co-regulated genes. The analysis results in transcriptional programs which generalise both transcriptional modules and TF-target gene incidence matrices and provide a higher-level summary of these structures. The method is independent of prior specification of training sets of genes, for example, via gene expression data. The analysis is based on known binding motifs. RESULTS: We applied our method to putative regulatory regions of 18,445 Mus musculus genes. We discovered just 68 transcriptional programs that effectively summarised the action of 149 transcription factors on these genes. Several of these programs were significantly enriched for known biological processes and signalling pathways. One transcriptional program has a significant overlap with a reference set of cell cycle specific transcription factors. CONCLUSION: Our method is able to pick out higher order structure from noisy sequence analyses. The transcriptional programs it identifies potentially represent common mechanisms of regulatory control across the genome. It simultaneously predicts which genes are co-regulated and which sets of transcription factors cooperate to achieve this co-regulation. The programs we discovered enable biologists to choose new genes and transcription factors to study in specific transcriptional regulatory systems.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

Springer - Publisher Connector

PubMed Central

Warwick Research Archives Portal Repository

Birkbeck Institutional Research Online

Apollo (Cambridge)

Unsupervised learning of transcriptional regulatory networks via latent tree graphical models

Author: Anandkumar Animashree
Fraenkel Ernest
Gitter Anthony
Huang Furong
Valluvan Ragupathyraj
Publication venue
Publication date: 20/09/2016
Field of study

Gene expression is a readily-observed quantification of transcriptional activity and cellular state that enables the recovery of the relationships between regulators and their target genes. Reconstructing transcriptional regulatory networks from gene expression data is a problem that has attracted much attention, but previous work often makes the simplifying (but unrealistic) assumption that regulator activity is represented by mRNA levels. We use a latent tree graphical model to analyze gene expression without relying on transcription factor expression as a proxy for regulator activity. The latent tree model is a type of Markov random field that includes both observed gene variables and latent (hidden) variables, which factorize on a Markov tree. Through efficient unsupervised learning approaches, we determine which groups of genes are co-regulated by hidden regulators and the activity levels of those regulators. Post-processing annotates many of these discovered latent variables as specific transcription factors or groups of transcription factors. Other latent variables do not necessarily represent physical regulators but instead reveal hidden structure in the gene expression such as shared biological function. We apply the latent tree graphical model to a yeast stress response dataset. In addition to novel predictions, such as condition-specific binding of the transcription factor Msn4, our model recovers many known aspects of the yeast regulatory network. These include groups of co-regulated genes, condition-specific regulator activity, and combinatorial regulation among transcription factors. The latent tree graphical model is a general approach for analyzing gene expression data that requires no prior knowledge of which possible regulators exist, regulator activity, or where transcription factors physically bind

arXiv.org e-Print Archive

eScholarship - University of California

Caltech Authors

A semi-parametric Bayesian model for unsupervised differential co-expression analysis

Author: Freudenberg Johannes M
Medvedovic Mario
Sivaganesan Siva
Wagner Michael
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Differential co-expression analysis is an emerging strategy for characterizing disease related dysregulation of gene expression regulatory networks. Given pre-defined sets of biological samples, such analysis aims at identifying genes that are co-expressed in one, but not in the other set of samples. Results We developed a novel probabilistic framework for jointly uncovering contexts (i.e. groups of samples) with specific co-expression patterns, and groups of genes with different co-expression patterns across such contexts. In contrast to current clustering and bi-clustering procedures, the implicit similarity measure in this model used for grouping biological samples is based on the clustering structure of genes within each sample and not on traditional measures of gene expression level similarities. Within this framework, biological samples with widely discordant expression patterns can be placed in the same context as long as the co-clustering structure of genes is concordant within these samples. To the best of our knowledge, this is the first method to date for unsupervised differential co-expression analysis in this generality. When applied to the problem of identifying molecular subtypes of breast cancer, our method identified reproducible patterns of differential co-expression across several independent expression datasets. Sample groupings induced by these patterns were highly informative of the disease outcome. Expression patterns of differentially co-expressed genes provided new insights into the complex nature of the ER<it>α </it>regulatory network. Conclusions We demonstrated that the use of the co-clustering structure as the similarity measure in the unsupervised analysis of sample gene expression profiles provides valuable information about expression regulatory networks.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

An integrative method to decode regulatory logics in gene transcription

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/10/2017
Field of study

abstract: Modeling of transcriptional regulatory networks (TRNs) has been increasingly used to dissect the nature of gene regulation. Inference of regulatory relationships among transcription factors (TFs) and genes, especially among multiple TFs, is still challenging. In this study, we introduced an integrative method, LogicTRN, to decode TF–TF interactions that form TF logics in regulating target genes. By combining cis-regulatory logics and transcriptional kinetics into one single model framework, LogicTRN can naturally integrate dynamic gene expression data and TF-DNA-binding signals in order to identify the TF logics and to reconstruct the underlying TRNs. We evaluated the newly developed methodology using simulation, comparison and application studies, and the results not only show their consistence with existing knowledge, but also demonstrate its ability to accurately reconstruct TRNs in biological complex systems.The final version of this article, as published in Nature Communications, can be viewed online at: http://www.nature.com/articles/s41467-017-01193-

ASU Digital Repository