Search CORE

100 research outputs found

Discovering transcriptional modules by Bayesian data integration

Author: Antoniak
Bar-Joseph
Bernard J. de la Cruz
Bähler
Cho
Dahl
Datta
David L. Wild
Eisen
Falcon
Ferguson
Fritsch
Gasch
Gerber
Geweke
Harbison
Ideker
Ihmels
Jim E. Griffin
Kundaje
Lee
Liu
Liu
Medvedovic
Medvedovic
Qin
Rasmussen
Rasmussen
Reid
Richard S. Savage
Savage
Segal
Segal
Teh
Teh
Wild
Yao
Yeung
Zoubin Ghahramani
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2010
Field of study

Motivation: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. Results: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs

Crossref

PubMed Central

Warwick Research Archives Portal Repository

Kent Academic Repository

CUED - Cambridge University Engineering Department

Detection of associations with rare and common SNPs for quantitative traits: a nonparametric Bayes-based approach

Author: B Li
BE Madsen
Brad G Kurowski
DJ Lunn
F Han
H Ishwaran
Hua He
J Sethuraman
LA Almasy
Lili Ding
Lisa J Martin
M Medvedovic
NJ Schork
S Morgenthaler
Tesfaye M Baye
TS Ferguson
Xue Zhang
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

We propose a nonparametric Bayes-based clustering algorithm to detect associations with rare and common single-nucleotide polymorphisms (SNPs) for quantitative traits. Unlike current methods, our approach identifies associations with rare genetic variants at the variant level, not the gene level. In this method, we use a Dirichlet process prior for the distribution of SNP-specific regression coefficients, conduct hierarchical clustering with a distance measure derived from posterior pairwise probabilities of two SNPs having the same regression coefficient, and explore data-driven approaches to select the number of clusters. SNPs falling inside the largest cluster have relatively low or close to zero estimates of regression coefficients and are considered not associated with the trait. SNPs falling outside the largest cluster have relatively high estimates of regression coefficients and are considered potential risk variants. Using the data from the Genetic Analysis Workshop 17, we successfully detected associations with both rare and common SNPs for a quantitative trait. We conclude that our method provides a novel and broadly applicable strategy for obtaining association results with a reasonably low proportion of false discovery and that it can be routinely used in resequencing studies

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Detection of regulator genes and eQTLs in gene networks

Author: A Butte
A Chatr-Aryamontri
A Clauset
A Joshi
A Joshi
A Kundaje
AA Shabalin
AJ Enright
AJ Walhout
AS Dimas
B Schwanhausser
B Zhang
B Zhang
C Cenik
CO Daub
D Koller
DA Cusanovich
DM Greenawalt
E Bonnet
E Ravasz
E Segal
EC Neto
EC Neto
EC Neto
EE Schadt
EE Schadt
EE Schadt
EE Schadt
EE Schadt
EJ Foss
F Grubert
F Yue
FA Cubillos
FW Albert
G Hemani
G Nicholson
GD Smith
GH Golub
H Foroughi Asl
H Talukdar
HN Kadarmideen
J Millstein
J Qi
J Zhu
J Zhu
J Zhu
JE Aten
JF Ayroles
JJ Faith
JL Björkegren
JS Liu
K Basso
K Qu
KG Ardlie
L Wu
LA Hindorff
LH Hartwell
LS Chen
M Ashburner
M Civelek
M Georges
M Gerstein
M Medvedovic
M Schmidt
M Scutari
MA Schaub
MB Eisen
MD Ritchie
ME Goddard
MEJ Newman
MEJ Newman
MV Rockman
MV Rockman
N Friedman
N Friedman
N Friedman
N Laird
O Stegle
P Langfelder
P Langfelder
P Langfelder
P Lu
R Sharan
R Sharan
RB Brem
RW Williams
S Lee
S Roy
S Tavazoie
SI Lee
SM Waszak
SS Rao
T Lappalainen
T Michoel
TA Manolio
TF Mackay
The ENCODE
TS Furey
VG Cheung
W Cookson
W Zhang
Y Chen
Y Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/11/2016
Field of study

Genetic differences between individuals associated to quantitative phenotypic traits, including disease states, are usually found in non-coding genomic regions. These genetic variants are often also associated to differences in expression levels of nearby genes (they are "expression quantitative trait loci" or eQTLs for short) and presumably play a gene regulatory role, affecting the status of molecular networks of interacting genes, proteins and metabolites. Computational systems biology approaches to reconstruct causal gene networks from large-scale omics data have therefore become essential to understand the structure of networks controlled by eQTLs together with other regulatory genes, and to generate detailed hypotheses about the molecular mechanisms that lead from genotype to phenotype. Here we review the main analytical methods and softwares to identify eQTLs and their associated genes, to reconstruct co-expression networks and modules, to reconstruct causal Bayesian gene and module networks, and to validate predicted networks in silico.Comment: minor revision with typos corrected; review article; 24 pages, 2 figure

arXiv.org e-Print Archive

Crossref

NeatMap - non-clustering heat map alternatives in R

Author: A Su
A Thalamuthu
C Chen
D Adler
D Baum
G Fink
G McLachlan
G Tseng
H Wickham
I Jolliffe
J Handl
J Kruskal
J Weinstein
M Brauer
M Eisen
M Hahsler
M Hibbs
M Medvedovic
M Schmid
O Alter
P Tamayo
R Development Core Team
S Raychaudhuri
S Tavazoie
Satwik Rajaram
Y Taguchi
Y Taguchi
Yoshi Oono
Z Qin
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The clustered heat map is the most popular means of visualizing genomic data. It compactly displays a large amount of data in an intuitive format that facilitates the detection of hidden structures and relations in the data. However, it is hampered by its use of cluster analysis which does not always respect the intrinsic relations in the data, often requiring non-standardized reordering of rows/columns to be performed post-clustering. This sometimes leads to uninformative and/or misleading conclusions. Often it is more informative to use dimension-reduction algorithms (such as Principal Component Analysis and Multi-Dimensional Scaling) which respect the topology inherent in the data. Yet, despite their proven utility in the analysis of biological data, they are not as widely used. This is at least partially due to the lack of user-friendly visualization methods with the visceral impact of the heat map. Results NeatMap is an R package designed to meet this need. NeatMap offers a variety of novel plots (in 2 and 3 dimensions) to be used in conjunction with these dimension-reduction techniques. Like the heat map, but unlike traditional displays of such results, it allows the entire dataset to be displayed while visualizing relations between elements. It also allows superimposition of cluster analysis results for mutual validation. NeatMap is shown to be more informative than the traditional heat map with the help of two well-known microarray datasets. Conclusions NeatMap thus preserves many of the strengths of the clustered heat map while addressing some of its deficiencies. It is hoped that NeatMap will spur the adoption of non-clustering dimension-reduction algorithms.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Expression profiles of switch-like genes accurately classify tissue and infectious disease phenotypes in model-based classification

Abstract Background Large-scale compilation of gene expression microarray datasets across diverse biological phenotypes provided a means of gathering a priori knowledge in the form of identification and annotation of bimodal genes in the human and mouse genomes. These switch-like genes consist of 15% of known human genes, and are enriched with genes coding for extracellular and membrane proteins. It is of interest to determine the prediction potential of bimodal genes for class discovery in large-scale datasets. Results Use of a model-based clustering algorithm accurately classified more than 400 microarray samples into 19 different tissue types on the basis of bimodal gene expression. Bimodal expression patterns were also highly effective in differentiating between infectious diseases in model-based clustering of microarray data. Supervised classification with feature selection restricted to switch-like genes also recognized tissue specific and infectious disease specific signatures in independent test datasets reserved for validation. Determination of "on" and "off" states of switch-like genes in various tissues and diseases allowed for the identification of activated/deactivated pathways. Activated switch-like genes in neural, skeletal muscle and cardiac muscle tissue tend to have tissue-specific roles. A majority of activated genes in infectious disease are involved in processes related to the immune response. Conclusion Switch-like bimodal gene sets capture genome-wide signatures from microarray data in health and infectious disease. A subset of bimodal genes coding for extracellular and membrane proteins are associated with tissue specificity, indicating a potential role for them as biomarkers provided that expression is altered in the onset of disease. Furthermore, we provide evidence that bimodal genes are involved in temporally and spatially active mechanisms including tissue-specific functions and response of the immune system to invading pathogens.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Semi-supervised learning for the identification of syn-expressed genes from fused microarray and in situ image data

Author: A Schliep
A Schliep
A Schliep
Alexander Schliep
B Edgar
C Niehrs
CLL Hendriks
D Tautz
EP Xing
G McLachlan
GJ McLachlan
H Ge
H Peng
H Peng
I Costa
I Lee
Ivan G Costa
J Bilmes
J Ernst
JY Pan
KY Yeung
KY Yeung
L Opitz
Lennart Opitz
M Ashburner
M Leptin
M Medvedovic
MB Eisen
MN Arbeitman
P Tomancak
P Tomancak
R Gonzalez
R Sokal
Roland Krause
SD Hooper
SK Ng
SVE Keränen
T Beissbarth
T Lange
V Stolc
W Pan
Y Luan
Z Bar-Joseph
Z Bar-Joseph
Z Lu
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Background: Gene expression measurements during the development of the fly Drosophila melanogaster are routinely used to find functional modules of temporally co-expressed genes. Complimentary large data sets of in situ RNA hybridization images for different stages of the fly embryo elucidate the spatial expression patterns. Results: Using a semi-supervised approach, constrained clustering with mixture models, we can find clusters of genes exhibiting spatio-temporal similarities in expression, or syn-expression. The temporal gene expression measurements are taken as primary data for which pairwise constraints are computed in an automated fashion from raw in situ images without the need for manual annotation. We investigate the influence of these pairwise constraints in the clustering and discuss the biological relevance of our results. Conclusion: Spatial information contributes to a detailed, biological meaningful analysis of temporal gene expression data. Semi-supervised learning provides a flexible, robust and efficient framework for integrating data sources of differing quality and abundance

Crossref

Springer - Publisher Connector

PubMed Central

MPG.PuRe

Query Large Scale Microarray Compendium Datasets Using a Model-Based Bayesian Approach with Variable Selection

Author: A Gelman
A Tanay
AA Margolin
AB Brinkman
AB Owen
AE Gelfand
AF Neuwald
AJ Butte
CJ Wolfe
D Ghosh
DE Bassett Jr
DJ Lockhart
FP Roth
G Getz
GJ McLachlan
H Salgado
J Qian
J Quackenbush
JJ Faith
JJ Faith
JS Liu
KY Yeung
M Medvedovic
M Schena
MA Hibbs
MB Eisen
MG Walker
Ming Hu
ML Urbanowski
Neil Hall
P Tamayo
PO Brown
Q Sheng
R Chen
S Kim
SC Madeira
SK Kim
T Dhollander
TF Smith
TH Tani
TR Hughes
VK Mootha
Y Cheng
Zhaohui S. Qin
ZS Qin
Publication venue: Public Library of Science
Publication date: 13/02/2009
Field of study

In microarray gene expression data analysis, it is often of interest to identify genes that share similar expression profiles with a particular gene such as a key regulatory protein. Multiple studies have been conducted using various correlation measures to identify co-expressed genes. While working well for small datasets, the heterogeneity introduced from increased sample size inevitably reduces the sensitivity and specificity of these approaches. This is because most co-expression relationships do not extend to all experimental conditions. With the rapid increase in the size of microarray datasets, identifying functionally related genes from large and diverse microarray gene expression datasets is a key challenge. We develop a model-based gene expression query algorithm built under the Bayesian model selection framework. It is capable of detecting co-expression profiles under a subset of samples/experimental conditions. In addition, it allows linearly transformed expression patterns to be recognized and is robust against sporadic outliers in the data. Both features are critically important for increasing the power of identifying co-expressed genes in large scale gene expression datasets. Our simulation studies suggest that this method outperforms existing correlation coefficients or mutual information-based query tools. When we apply this new method to the Escherichia coli microarray compendium data, it identifies a majority of known regulons as well as novel potential target genes of numerous key transcription factors

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Anel de tensão capsular isolado e associado à lente intraocular acrílica dobrável na opacidade de cápsula posterior após facoemulsificação em cães

Author: Adriana Morales
Alexandre Lima de Andrade
APPLE D.J.
BRAS I.D.
FINDL O.
Fábio Andrade Marinho
Fábio Luiz da Cunha Brito
GERARDI J.G.
GIFT B.W.
GOLDIM J.R.
HARA T.
HAZRA S.
Ivan Ricardo Martinez Pádua
José Luiz Laus
KAPPELHOF J.
KIM J.H.
KOHNEN T.
KUGELBERG M.
Luciano Fernandes da Conceição
MARQUES D.M.
MEDVEDOVIC M.
NAGAMOTO T.
NASISSE M.P.
NISHI O.
Paula Ferreira da Costa
WORMSTONE M.
YI N.Y.
Publication venue: 'FapUNIFESP (SciELO)'
Publication date
Field of study

Crossref

Deletion Hotspots in AMACR Promoter CpG Island Are cis-Regulatory Elements Controlling the Gene Expression in the Colon

Author: A Baron
A Efstratiadis
A Lin
A Nassar
A Rabinovich
A Trichopoulou
AJ Carnell
AM Levin
C Kumar-Sinha
ER Fearon
F Levi
GP Pfeifer
I Leav
I Leav
Irwin Leav
J Luo
J Xu
JA Mobley
K Cartharius
Lisa Stubbs
M Benito
M Esteller
M Zhou
MA Rubin
Mario Medvedovic
MC Stene
Monica P. Revelo
N Niho
R Deka
R Kuefer
Ranjan Deka
S Bakshi
S Ferdinandusse
S Rohrmann
S Rozen
S Wagner
S Zha
S Zha
SE Daugherty
SE Daugherty
Shuk-Mei Ho
T Langmann
V Ananthanarayanan
V Oron-Karni
VX Jin
X Shi
X Zhang
Xiang Zhang
YY Wang
Z Jiang
Z Jiang
Z Jiang
Z Jiang
Zhong Jiang
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Alpha-methylacyl-coenzyme A racemase (AMACR) regulates peroxisomal β-oxidation of phytol-derived, branched-chain fatty acids from red meat and dairy products — suspected risk factors for colon carcinoma (CCa). AMACR was first found overexpressed in prostate cancer but not in benign glands and is now an established diagnostic marker for prostate cancer. Aberrant expression of AMACR was recently reported in Cca; however, little is known about how this gene is abnormally activated in cancer. By using a panel of immunostained-laser-capture-microdissected clinical samples comprising the entire colon adenoma–carcinoma sequence, we show that deregulation of AMACR during colon carcinogenesis involves two nonrandom events, resulting in the mutually exclusive existence of double-deletion at CG3 and CG10 and deletion of CG12-16 in a newly identified CpG island within the core promoter of AMACR. The double-deletion at CG3 and CG10 was found to be a somatic lesion. It existed in histologically normal colonic glands and tubular adenomas with low AMACR expression and was absent in villous adenomas and all CCas expressing variable levels of AMACR. In contrast, deletion of CG12-16 was shown to be a constitutional allele with a frequency of 43% in a general population. Its prevalence reached 89% in moderately differentiated CCas strongly expressing AMACR but only existed at 14% in poorly differentiated CCas expressing little or no AMACR. The DNA sequences housing these deletions were found to be putative cis-regulatory elements for Sp1 at CG3 and CG10, and ZNF202 at CG12-16. Chromatin immunoprecipitation, siRNA knockdown, gel shift assay, ectopic expression, and promoter analyses supported the regulation by Sp1 and ZNF202 of AMACR gene expression in an opposite manner. Our findings identified key in vivo events and novel transcription factors responsible for AMACR regulation in CCas and suggested these AMACR deletions may have diagnostic/prognostic value for colon carcinogenesis

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Probabilistic machine learning and artificial intelligence.

Author: A Doucet
A Gelman
A Korattikara
A Krizhevsky
A O'Hagan
A Pfeffer
A Pfeffer
A Pfeffer
B Bakker
B De Finetti
B Fischer
B Milch
B Paige
C Freer
C Kemp
C Lu
C Shannon
C Thornton
CE Rasmussen
CE Rasmussen
CE Rasmussen
CM Bishop
CM Bishop
D Koller
D Koller
D Wingate
DE Wolstenholme
DJ Hand
DJ Lunn
DJC MacKay
DM Wolpert
DR Jones
ET Jaynes
F Wood
F Wood
G Hinton
GE Hinton
GF Marcus
H Kushner
H Robbins
I Sutskever
J Bergstra
J Hensman
J Snoek
JB Tenenbaum
JM Hernández-Lobato
JR Lloyd
K Doya
K Miller
KP Murphy
KS Van Horn
L Li
LR Rabiner
M Girolami
M Hoffman
M Jordan
M Medvedovic
M Schmidt
M Welling
MI Jordan
MP Deisenroth
N Goodman
N Hjort
N Houlsby
ND Goodman
ND Goodman
P Diaconis
P Hennig
P Marjoram
P Orbanz
P Poupart
P Sermanet
RB Grosse
RD King
RM Neal
RM Neal
RM Neal
RM Neal
RP Adams
RT Cox
S Deneve
S Russell
S Thrun
SJ Russell
TL Griffiths
TL Griffiths
TP Minka
TP Minka
TS Ferguson
V Mansinghka
WH Jefferys
Y Bengio
YW Teh
Z Ghahramani
Publication venue: 'The Nature Conservancy'
Publication date: 01/05/2015
Field of study

How can a machine learn from experience? Probabilistic modelling provides a framework for understanding what learning is, and has therefore emerged as one of the principal theoretical and practical approaches for designing machines that learn from data acquired through experience. The probabilistic framework, which describes how to represent and manipulate uncertainty about models and predictions, has a central role in scientific data analysis, machine learning, robotics, cognitive science and artificial intelligence. This Review provides an introduction to this framework, and discusses some of the state-of-the-art advances in the field, namely, probabilistic programming, Bayesian optimization, data compression and automatic model discovery.The author acknowledges an EPSRC grant EP/I036575/1, the DARPA PPAML programme, a Google Focused Research Award for the Automatic Statistician and support from Microsoft Research.This is the author accepted manuscript. The final version is available from NPG at http://www.nature.com/nature/journal/v521/n7553/full/nature14541.html#abstract

Crossref

Apollo (Cambridge)