Search CORE

52 research outputs found

The identification of informative genes from multiple datasets with increasing complexity

Author: AH Fielding
Allan Tucker
BC Haynes
C Zhang
D Grossman
D Heckerman
D Madigan
DM Chickering
DR Rhodes
E Segal
G Schwarz
H Ma
J Bockhorst
J Pearl
J Su
JB Tobler
JM Peña
KK Tomczak
KP Murphy
M Miron
M Stone
N Friedman
N Friedman
N Friedman
Peter AC 't Hoen
R Jelier
R Kohavi
R Mac Nally
RA Irizarry
S Iezzi
S Yahya Anvar
SS Shen-Orr
TI Lee
TVan den Bulcke
W Lam
WL Buntine
X Xu
Y Cao
Y Lai
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Background In microarray data analysis, factors such as data quality, biological variation, and the increasingly multi-layered nature of more complex biological systems complicates the modelling of regulatory networks that can represent and capture the interactions among genes. We believe that the use of multiple datasets derived from related biological systems leads to more robust models. Therefore, we developed a novel framework for modelling regulatory networks that involves training and evaluation on independent datasets. Our approach includes the following steps: (1) ordering the datasets based on their level of noise and informativeness; (2) selection of a Bayesian classifier with an appropriate level of complexity by evaluation of predictive performance on independent data sets; (3) comparing the different gene selections and the influence of increasing the model complexity; (4) functional analysis of the informative genes. Results In this paper, we identify the most appropriate model complexity using cross-validation and independent test set validation for predicting gene expression in three published datasets related to myogenesis and muscle differentiation. Furthermore, we demonstrate that models trained on simpler datasets can be used to identify interactions among genes and select the most informative. We also show that these models can explain the myogenesis-related genes (genes of interest) significantly better than others (P < 0.004) since the improvement in their rankings is much more pronounced. Finally, after further evaluating our results on synthetic datasets, we show that our approach outperforms a concordance method by Lai et al. in identifying informative genes from multiple datasets with increasing complexity whilst additionally modelling the interaction between genes. Conclusions We show that Bayesian networks derived from simpler controlled systems have better performance than those trained on datasets from more complex biological systems. Further, we present that highly predictive and consistent genes, from the pool of differentially expressed genes, across independent datasets are more likely to be fundamentally involved in the biological process under study. We conclude that networks trained on simpler controlled systems, such as in vitro experiments, can be used to model and capture interactions among genes in more complex datasets, such as in vivo experiments, where these interactions would otherwise be concealed by a multitude of other ongoing events

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Leiden University Scholary Publications

Brunel University Research Archive

Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis

Author: AL Price
Barbara E. Engelhardt
BN Howie
Bruce Walsh
C Carvalho
C Eckart
C Liu
D Falush
D Reich
D Serre
DD Lee
DD Lee
DF Conrad
DH Alexander
DJC Mackay
DM Witten
E Fokoue
G McVean
H Tang
H Tang
HF Lopes
HG Parker
I Pournara
J Canny
J Lucas
J Novembre
J Novembre
J Pritchard
JK Pritchard
JK Pritchard
M West
Matthew Stephens
ME Tipping
ME Tipping
MR Nelson
N Lawrence
N Patterson
NA Rosenberg
O Lao
RM Neal
RR Hudson
SK Wasser
W Buntine
X Zhu
Z Ghahramani
Publication venue: Public Library of Science
Publication date: 01/09/2010
Field of study

We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more “continuous,” as in isolation-by-distance models

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Latent grouping models for user preference prediction

Author: A. Popescul
A. Tanay
B. Marlin
B. Marlin
D. Blei
D. M. Blei
E. Erosheva
Eerika Savia
H. Wettig
J. K. Pritchard
J. Konstan
K. Puolamäki
K. Yu
Kai Puolamäki
L. Si
N. Metropolis
S. C. Madeira
S. Yu
Samuel Kaski
T. Hofmann
U. Shardanand
W. Buntine
W. Buntine
W. K. Hastings
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Evolutionary approaches for the reverse-engineering of gene regulatory networks: A study on a biologically realistic dataset

Author: AJ Hartemink
B Schölkopf
C Cotta
C Cotta
CF Aliferis
CG Peter Spirtes
Chickering
Cédric Auliac
D Chickering
D Heckerman
D Husmeier
D Pe'er
DE Goldberg
DG DM Chickering
E Segal
EP van Someren
Florence d'Alché-Buc
Friedman
FV Jensen
GF Cooper
H de Jong
I Tsamardinos
Imoto M Goto
J Cheng
J Pearl
JH Holland
JM Pena
JW Myers
KAD Jong
M Quach
ML Wong
N Friedman
N Friedman
P Giudici
P Larranaga
P Larranaga
P Spirtes
PP Le
R Etxeberria
R Robinson
RG Cowell
SW Mahfoud
T Gärtner
T Kocka
T Verma
Vincent Frouin
W Buntine
WH Hsu
Xavier Gidrol
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Inferring gene regulatory networks from data requires the development of algorithms devoted to structure extraction. When only static data are available, gene interactions may be modelled by a Bayesian Network (BN) that represents the presence of direct interactions from regulators to regulees by conditional probability distributions. We used enhanced evolutionary algorithms to stochastically evolve a set of candidate BN structures and found the model that best fits data without prior knowledge. Results We proposed various evolutionary strategies suitable for the task and tested our choices using simulated data drawn from a given bio-realistic network of 35 nodes, the so-called insulin network, which has been used in the literature for benchmarking. We assessed the inferred models against this reference to obtain statistical performance results. We then compared performances of evolutionary algorithms using two kinds of recombination operators that operate at different scales in the graphs. We introduced a niching strategy that reinforces diversity through the population and avoided trapping of the algorithm in one local minimum in the early steps of learning. We show the limited effect of the mutation operator when niching is applied. Finally, we compared our best evolutionary approach with various well known learning algorithms (MCMC, K2, greedy search, TPDA, MMHC) devoted to BN structure learning. Conclusion We studied the behaviour of an evolutionary approach enhanced by niching for the learning of gene regulatory networks with BN. We show that this approach outperforms classical structure learning methods in elucidating the original model. These results were obtained for the learning of a bio-realistic network and, more importantly, on various small datasets. This is a suitable approach for learning transcriptional regulatory networks from real datasets without prior knowledge.</p

HAL Evry

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

HAL-CEA

HAL: Hyper Article en Ligne

Hal-Diderot

Classification and Analysis of Regulatory Pathways Using Graph Property, Biochemical and Physicochemical Property, and Functional Property

Author: A Bairoch
A Barabasi
C Chen
C Chen
C Klukas
C Krieger
Cathal Seoighe
CF Gao
D Chakrabarti
D Frishman
DN Georgiou
E Camon
F Chiti
G Pollastri
GF Cooper
GP Zhou
GP Zhou
GY Zhang
H Ding
H Lin
H Mohabatkar
H Mohabatkar
H Ogata
H Peng
I Althaus
I Althaus
I Althaus
I Dubchak
I Dubchak
I Schomburg
I Schomburg
IH Witten
J Andraos
J Cheng
J Cheng
JD Qiu
JM Dale
K Chou
K Chou
K Chou
K Chou
K Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
KC Chou
Kuo-Chen Chou
L Chen
L Chen
L Chen
L Chen
L Chen
L Lu
L Lu
L Yu
Lei Chen
M Chang
M Esmaeili
M Kanehisa
M Kanehisa
M Kanehisa
M Kanehisa
N Chazal
N Friedman
P Carmona-Saez
P Pharkya
Q Gu
R Caspi
R Caspi
RR Bouckaert
S Salzberg
SS Keerthi
T Denoeux
T Huang
T Huang
T Huang
T Huang
T Huang
Tao Huang
U Stelzl
W Buntine
X Xiao
XB Zhou
Y Cai
Y Cai
Y Cai
Y Qi
YH Zeng
YS Lobanova
Yu-Dong Cai
Z He
ZC Wu
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Given a regulatory pathway system consisting of a set of proteins, can we predict which pathway class it belongs to? Such a problem is closely related to the biological function of the pathway in cells and hence is quite fundamental and essential in systems biology and proteomics. This is also an extremely difficult and challenging problem due to its complexity. To address this problem, a novel approach was developed that can be used to predict query pathways among the following six functional categories: (i) “Metabolism”, (ii) “Genetic Information Processing”, (iii) “Environmental Information Processing”, (iv) “Cellular Processes”, (v) “Organismal Systems”, and (vi) “Human Diseases”. The prediction method was established trough the following procedures: (i) according to the general form of pseudo amino acid composition (PseAAC), each of the pathways concerned is formulated as a 5570-D (dimensional) vector; (ii) each of components in the 5570-D vector was derived by a series of feature extractions from the pathway system according to its graphic property, biochemical and physicochemical property, as well as functional property; (iii) the minimum redundancy maximum relevance (mRMR) method was adopted to operate the prediction. A cross-validation by the jackknife test on a benchmark dataset consisting of 146 regulatory pathways indicated that an overall success rate of 78.8% was achieved by our method in identifying query pathways among the above six classes, indicating the outcome is quite promising and encouraging. To the best of our knowledge, the current study represents the first effort in attempting to identity the type of a pathway system or its biological function. It is anticipated that our report may stimulate a series of follow-up investigations in this new and challenging area

CiteSeerX

Crossref

Directory of Open Access Journals

PubMed Central

A review on probabilistic graphical models in evolutionary computation

Author: A. Brownlee
A. Cuesta-Infante
A.P. Dawid
A.P. Dempster
B. Li
B.J. Frey
C. Ahn
C. Echegoyen
C. González
C. Lima
C. Lima
C.W. Ahn
Concha Bielza
D. Chickering
D. Chickering
D. Geiger
D. Heckerman
D. Heckerman
D. Koller
D. Thierens
D.E. Goldberg
D.Y. Cho
D.Y. Cho
E. Bengoetxea
G. Cooper
G. Harik
G. Harik
G. Schwarz
H. Akaike
H. Karshenas
H. Karshenas
H. Mühlenbein
H. Mühlenbein
H. Mühlenbein
Hossein Karshenas
J. Bonet De
J. Grahl
J. Gámez
J. Holland
J. Očenášek
J. Očenášek
J. Pearl
J. Rissanen
J. Sun
J. Xiao
J.M. Peña
J.R. Koza
K. Sastry
K. Sastry
K. Yanai
L. Martí
L.F. Wang
L.F. Wang
L.J. Fogel
M. Costa
M. Frydenberg
M. Pelikan
M. Pelikan
M. Pelikan
M. Pelikan
M. Pelikan
M. Pelikan
M. Pelikan
M. Sebag
N. Ding
N. Luo
N.L. Cramer
P. Larrañaga
P. Larrañaga
P. Larrañaga
P. Pošík
P. Pošík
P. Pošík
P. Spirtes
P. Spirtes
P.A.D. Castro de
P.A.N. Bosman
P.A.N. Bosman
P.A.N. Bosman
P.A.N. Bosman
P.A.N. Bosman
P.A.N. Bosman
P.A.N. Bosman
Pedro Larrañaga
Q. Zhang
Q. Zhang
R. Etxeberria
R. McKay
R. Robinson
R. Salinas-Gutiérrez
R. Santana
R. Santana
R. Santana
R. Santana
R. Santana
R. Santana
R. Santana
R.P. Sałustowicz
R.S. Michalski
Roberto Santana
S. Baluja
S. Geman
S. Tsutsui
S. Tsutsui
S.I. Valdez-Peña
S.L. Lauritzen
T. Miquélez
T. Miquélez
T. Weise
W. Buntine
X. Wang
Y. Hasegawa
Y. Hong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Thanks to their inherent properties, probabilistic graphical models are one of the prime candidates for machine learning and decision making tasks especially in uncertain domains. Their capabilities, like representation, inference and learning, if used effectively, can greatly help to build intelligent systems that are able to act accordingly in different problem domains. Evolutionary algorithms is one such discipline that has employed probabilistic graphical models to improve the search for optimal solutions in complex problems. This paper shows how probabilistic graphical models have been used in evolutionary algorithms to improve their performance in solving complex problems. Specifically, we give a survey of probabilistic model building-based evolutionary algorithms, called estimation of distribution algorithms, and compare different methods for probabilistic modeling in these algorithms

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM (Univ. Politécnica de Madrid)

Behavioral Hierarchy: Exploration and Representation

Author: A. G. Barto
A. G. Barto
A. Jonsson
A. Jonsson
A. McGovern
A. Newell
B. Bakker
B. C. Silva da
B. Digney
B. Hengst
C. Boutilier
C. Guestrin
D. A. Waterman
D. Heckerman
D. W. Schneider
E. D. Sacerdoti
G. A. Miller
G. J. Tesauro
G. Konidaris
G. Konidaris
G. Konidaris
G. Konidaris
G. Konidaris
G. Konidaris
H. A. Simon
H. A. Simon
H. Seijen van
H. Steck
I. Menache
J. Gibson
J. Mugan
J. Pearl
J. R. Anderson
J. Schmidhuber
K. Murphy
K. S. Lashley
L. Torrey
M. E. Taylor
M. E. Taylor
M. Huber
M. M. Botvinick
M. M. Botvinick
M. Pickett
N. Friedman
N. Mehta
P. Langley
R. Alur
R. E. Bellman
R. E. Fikes
R. E. Korf
R. M. Ryan
R. Parr
R. R. Burridge
R. S. Sutton
R. S. Sutton
R. Tedrake
R. Tedrake
R. W. White
S. B. Thrun
S. Hart
S. Mahadevan
S. Mannor
S. Singh
S. Tong
T. G. Dietterich
T. G. Dietterich
T. L. Dean
W. Buntine
W. Callebaut
Y. Liu
Ö. Şimşek
Ö. Şimşek
Ö. Şimşek
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Predicting Nearly as Well as the Best Pruning of a Planar Decision Graph

Author: [.C.B.F.H.+.9.7.]._.N. Cesa-Bianchi
D. Helmbold
E. Lawler
F. Willems
J. Kivinen
M. Maass
N. Littlestone
N. Littlestone
R. Hassin
S. Levinson
W. Buntine
Publication venue: Springer-Verlag
Publication date: 01/01/1999
Field of study

We design ecient on-line algorithms that predict nearly as well as the best pruning of a planar decision graph. We assume that the graph has no cycles. As in the previous work on decision trees, we implicitly maintain one weight for each of the prunings (exponentially many). The method works for a large class of algorithms that update its weights multiplicatively. It can also be used to design algorithms that predict nearly as well as the best convex combination of prunings. 1 Introduction Decision trees are widely used in Machine Learning. Frequently a large tree is produced initially and then this tree is pruned for the purpose of obtaining a better predictor. A pruning is produced by deleting some nodes and with them all their successors. Although there are exponentially many prunings, a recent method developed in coding theory [WST95] and machine learning [Bun92] makes it possible to (implicitly) maintain one weight per pruning. In particular Helmbold and Schapire [HS97] use this m..

CiteSeerX

Crossref

Immiscibility Area in the System TiO2-ZrO2-SiO2

Author: Agamawi Y. M.
Agamawi Y. M.
Brown F. H.
Buntine E. N.
Curtis C. E.
DeVries R. C.
Sowman H. G.
Publication venue: 'Wiley'
Publication date
Field of study

Crossref

Phase Equilibria in the System Iron Oxide-Cr2O3- SiO2 in Air

Author: Bowen N. L.
Buntine E. N.
Bunting E. N.
Corruccini R. J.
Glasser F. P.
Keith M. L.
Muan Arnulf
Muan Arnulf
Muan Arnulf
Muan Arnulf
Sosman R. B.
Publication venue: 'Wiley'
Publication date
Field of study

Crossref