8,057 research outputs found
Modeling and visualizing uncertainty in gene expression clusters using Dirichlet process mixtures
Although the use of clustering methods has rapidly become one of the standard computational approaches in the literature of microarray gene expression data, little attention has been paid to uncertainty in the results obtained. Dirichlet process mixture (DPM) models provide a nonparametric Bayesian alternative to the bootstrap approach to modeling uncertainty in gene expression clustering. Most previously published applications of Bayesian model-based clustering methods have been to short time series data. In this paper, we present a case study of the application of nonparametric Bayesian clustering methods to the clustering of high-dimensional nontime series gene expression data using full Gaussian covariances. We use the probability that two genes belong to the same cluster in a DPM model as a measure of the similarity of these gene expression profiles. Conversely, this probability can be used to define a dissimilarity measure, which, for the purposes of visualization, can be input to one of the standard linkage algorithms used for hierarchical clustering. Biologically plausible results are obtained from the Rosetta compendium of expression profiles which extend previously published cluster analyses of this data
Extreme learning machines for reverse engineering of gene regulatory networks from expression time series
The reconstruction of gene regulatory networks (GRNs) from genes profiles has a growing interest in bioinformatics for understanding the complex regulatory mechanisms in cellular systems. GRNs explicitly represent the cause-effect of regulation among a group of genes and its reconstruction is today a challenging computational problem. Several methods were proposed, but most of them require different input sources to provide an acceptable prediction. Thus, it is a great challenge to reconstruct a GRN only from temporal gene-expression data. Results: Extreme Learning Machine (ELM) is a new supervised neural model that has gained interest in the last years because of its higher learning rate and better performance than existing supervised models in terms of predictive power. This work proposes a novel approach for GRNs reconstruction in which ELMs are used for modeling the relationships between gene expression time series. Artificial datasets generated with the well-known benchmark tool used in DREAM competitions were used. Real datasets were used for validation of this novel proposal with well-known GRNs underlying the time series. The impact of increasing the size of GRNs was analyzed in detail for the compared methods. The results obtained confirm the superiority of the ELM approach against very recent state-of-the-art methods in the same experimental conditions.Fil: Rubiolo, Mariano. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Milone, Diego Humberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; ArgentinaFil: Stegmayer, Georgina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional. Universidad Nacional del Litoral. Facultad de Ingeniería y Ciencias Hídricas. Instituto de Investigación en Señales, Sistemas e Inteligencia Computacional; Argentin
Bayesian variable selection and data integration for biological regulatory networks
A substantial focus of research in molecular biology are gene regulatory
networks: the set of transcription factors and target genes which control the
involvement of different biological processes in living cells. Previous
statistical approaches for identifying gene regulatory networks have used gene
expression data, ChIP binding data or promoter sequence data, but each of these
resources provides only partial information. We present a Bayesian hierarchical
model that integrates all three data types in a principled variable selection
framework. The gene expression data are modeled as a function of the unknown
gene regulatory network which has an informed prior distribution based upon
both ChIP binding and promoter sequence data. We also present a variable
weighting methodology for the principled balancing of multiple sources of prior
information. We apply our procedure to the discovery of gene regulatory
relationships in Saccharomyces cerevisiae (Yeast) for which we can use several
external sources of information to validate our results. Our inferred
relationships show greater biological relevance on the external validation
measures than previous data integration methods. Our model also estimates
synergistic and antagonistic interactions between transcription factors, many
of which are validated by previous studies. We also evaluate the results from
our procedure for the weighting for multiple sources of prior information.
Finally, we discuss our methodology in the context of previous approaches to
data integration and Bayesian variable selection.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS130 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Predicting Genetic Regulatory Response Using Classification
We present a novel classification-based method for learning to predict gene
regulatory response. Our approach is motivated by the hypothesis that in simple
organisms such as Saccharomyces cerevisiae, we can learn a decision rule for
predicting whether a gene is up- or down-regulated in a particular experiment
based on (1) the presence of binding site subsequences (``motifs'') in the
gene's regulatory region and (2) the expression levels of regulators such as
transcription factors in the experiment (``parents''). Thus our learning task
integrates two qualitatively different data sources: genome-wide cDNA
microarray data across multiple perturbation and mutant experiments along with
motif profile data from regulatory sequences. We convert the regression task of
predicting real-valued gene expression measurement to a classification task of
predicting +1 and -1 labels, corresponding to up- and down-regulation beyond
the levels of biological and measurement noise in microarray measurements. The
learning algorithm employed is boosting with a margin-based generalization of
decision trees, alternating decision trees. This large-margin classifier is
sufficiently flexible to allow complex logical functions, yet sufficiently
simple to give insight into the combinatorial mechanisms of gene regulation. We
observe encouraging prediction accuracy on experiments based on the Gasch S.
cerevisiae dataset, and we show that we can accurately predict up- and
down-regulation on held-out experiments. Our method thus provides predictive
hypotheses, suggests biological experiments, and provides interpretable insight
into the structure of genetic regulatory networks.Comment: 8 pages, 4 figures, presented at Twelfth International Conference on
Intelligent Systems for Molecular Biology (ISMB 2004), supplemental website:
http://www.cs.columbia.edu/compbio/geneclas
Dynamic Epistasis under Varying Environmental Perturbations
Epistasis describes the phenomenon that mutations at different loci do not
have independent effects with regard to certain phenotypes. Understanding the
global epistatic landscape is vital for many genetic and evolutionary theories.
Current knowledge for epistatic dynamics under multiple conditions is limited
by the technological difficulties in experimentally screening epistatic
relations among genes. We explored this issue by applying flux balance analysis
to simulate epistatic landscapes under various environmental perturbations.
Specifically, we looked at gene-gene epistatic interactions, where the
mutations were assumed to occur in different genes. We predicted that epistasis
tends to become more positive from glucose-abundant to nutrient-limiting
conditions, indicating that selection might be less effective in removing
deleterious mutations in the latter. We also observed a stable core of
epistatic interactions in all tested conditions, as well as many epistatic
interactions unique to each condition. Interestingly, genes in the stable
epistatic interaction network are directly linked to most other genes whereas
genes with condition-specific epistasis form a scale-free network. Furthermore,
genes with stable epistasis tend to have similar evolutionary rates, whereas
this co-evolving relationship does not hold for genes with condition-specific
epistasis. Our findings provide a novel genome-wide picture about epistatic
dynamics under environmental perturbations.Comment: 22 pages, 9 figure
Modeling Gene Networks in Saccharomyces cerevisiae
Detailed and innovative analysis of gene regulatory network structures may reveal novel insights to biological mechanisms. Here we study how gene regulatory network in Saccharomyces cerevisiae can differ under aerobic and anaerobic conditions. To achieve this, we discretized the gene expression profiles and calculated the self-entropy of down- and upregulation of gene expression as well as joint entropy. Based on these quantities the uncertainty coefficient was calculated for each gene triplet, following which, separate gene logic networks were constructed for the aerobic and anaerobic conditions. Four structural parameters such as average degree, average clustering coefficient, average shortest path, and average betweenness were used to compare the structure of the corresponding aerobic and anaerobic logic networks. Five genes were identified to be putative key components of the two energy metabolisms. Furthermore, community analysis using the Newman fast algorithm revealed two significant communities for the aerobic but only one for the anaerobic network. David Gene Functional Classification suggests that, under aerobic conditions, one such community reflects the cell cycle and cell replication, while the other one is linked to the mitochondrial respiratory chain function
- …