98 research outputs found
Inferring dynamic genetic networks with low order independencies
In this paper, we propose a novel inference method for dynamic genetic
networks which makes it possible to face with a number of time measurements n
much smaller than the number of genes p. The approach is based on the concept
of low order conditional dependence graph that we extend here in the case of
Dynamic Bayesian Networks. Most of our results are based on the theory of
graphical models associated with the Directed Acyclic Graphs (DAGs). In this
way, we define a minimal DAG G which describes exactly the full order
conditional dependencies given the past of the process. Then, to face with the
large p and small n estimation case, we propose to approximate DAG G by
considering low order conditional independencies. We introduce partial qth
order conditional dependence DAGs G(q) and analyze their probabilistic
properties. In general, DAGs G(q) differ from DAG G but still reflect relevant
dependence facts for sparse networks such as genetic networks. By using this
approximation, we set out a non-bayesian inference method and demonstrate the
effectiveness of this approach on both simulated and real data analysis. The
inference procedure is implemented in the R package 'G1DBN' freely available
from the CRAN archive
Inference of Temporally Varying Bayesian Networks
When analysing gene expression time series data an often overlooked but
crucial aspect of the model is that the regulatory network structure may change
over time. Whilst some approaches have addressed this problem previously in the
literature, many are not well suited to the sequential nature of the data. Here
we present a method that allows us to infer regulatory network structures that
may vary between time points, utilising a set of hidden states that describe
the network structure at a given time point. To model the distribution of the
hidden states we have applied the Hierarchical Dirichlet Process Hideen Markov
Model, a nonparametric extension of the traditional Hidden Markov Model, that
does not require us to fix the number of hidden states in advance. We apply our
method to exisiting microarray expression data as well as demonstrating is
efficacy on simulated test data
In Silico Gene Regulatory Network of the Maurer’s Cleft Pathway in Plasmodium falciparum
The Maurer’s clefts (MCs) are very important for the survival of Plasmodium falciparum within an infected cell as they are induced by the parasite itself in the erythrocyte for protein trafficking. The MCs form an interesting part of the parasite’s biology as they shed more light on how the parasite remodels the erythrocyte leading to host pathogenesis and death. Here, we predicted and analyzed the genetic regulatory network of genes identified to belong to the MCs using regularized graphical Gaussian model. Our network shows four major activators, their corresponding target genes, and predicted binding sites. One of these master activators is the serine repeat antigen 5 (SERA5), predominantly expressed among the SERA multigene family of P. falciparum, which is one of the blood-stage malaria vaccine candidates. Our results provide more details about functional interactions and the regulation of the genes in the MCs’ pathway of P. falciparum
In Silico Gene Regulatory Network of the Maurer’s Cleft Pathway in Plasmodium falciparum
The Maurer’s clefts (MCs) are very important for the survival of Plasmodium falciparum within an infected cell as they are induced by the parasite itself in the erythrocyte for protein trafficking. The MCs form an interesting part of the parasite’s biology as they shed more light on how the parasite remodels the erythrocyte leading to host pathogenesis and death. Here, we predicted and analyzed the genetic regulatory network of genes identified to belong to the MCs using regularized graphical Gaussian model. Our network shows four major activators, their corresponding target genes, and predicted binding sites. One of these master activators is the serine repeat antigen 5 (SERA5), predominantly expressed among the SERA multigene family of P. falciparum, which is one of the blood-stage malaria vaccine candidates. Our results provide more details about functional interactions and the regulation of the genes in the MCs’ pathway of P. falciparum
Transcriptome-based Gene Networks for Systems-level Analysis of Plant Gene Functions
Present day genomic technologies are evolving at an unprecedented rate, allowing interrogation of
cellular activities with increasing breadth and depth. However, we know very little about how the
genome functions and what the identified genes do. The lack of functional annotations of genes
greatly limits the post-analytical interpretation of new high throughput genomic datasets. For plant
biologists, the problem is much severe. Less than 50% of all the identified genes in the model plant
Arabidopsis thaliana, and only about 20% of all genes in the crop model Oryza sativa have some
aspects of their functions assigned. Therefore, there is an urgent need to develop innovative
methods to predict and expand on the currently available functional annotations of plant genes.
With open-access catching the ‘pulse’ of modern day molecular research, an integration of the
copious amount of transcriptome datasets allows rapid prediction of gene functions in specific
biological contexts, which provide added evidence over traditional homology-based functional
inference. The main goal of this dissertation was to develop data analysis strategies and tools
broadly applicable in systems biology research.
Two user friendly interactive web applications are presented: The Rice Regulatory
Network (RRN) captures an abiotic-stress conditioned gene regulatory network designed to
facilitate the identification of transcription factor targets during induction of various environmental
stresses. The Arabidopsis Seed Active Network (SANe) is a transcriptional regulatory network
that encapsulates various aspects of seed formation, including embryogenesis, endosperm
development and seed-coat formation. Further, an edge-set enrichment analysis algorithm is
proposed that uses network density as a parameter to estimate the gain or loss in correlation of
pathways between two conditionally independent coexpression networks
Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data
Motivation: Although widely accepted that high-throughput biological data are typically highly noisy, the effects that this uncertainty has upon the conclusions we draw from these data are often overlooked. However, in order to assign any degree of confidence to our conclusions, we must quantify these effects. Bootstrap resampling is one method by which this may be achieved. Here, we present a parametric bootstrapping approach for time-course data, in which Gaussian process regression (GPR) is used to fit a probabilistic model from which replicates may then be drawn. This approach implicitly allows the time dependence of the data to be taken into account, and is applicable to a wide range of problems
Statistical inference from large-scale genomic data
This thesis explores the potential of statistical inference methodologies in their applications in functional genomics. In essence, it summarises algorithmic findings in this field, providing step-by-step analytical methodologies for deciphering biological knowledge from large-scale genomic data, mainly microarray gene expression time series.
This thesis covers a range of topics in the investigation of complex multivariate genomic data. One focus involves using clustering as a method of inference and another is cluster validation to extract meaningful biological information from the data. Information gained from the application of these various techniques can then be used conjointly in the elucidation of gene regulatory networks, the ultimate goal of this type of analysis. First, a new tight clustering method for gene expression data is proposed to obtain tighter and potentially more informative gene clusters. Next, to fully utilise biological knowledge in clustering validation, a validity index is defined based on one of the most important ontologies within the Bioinformatics community, Gene Ontology. The method bridges a gap in current literature, in the sense that it takes into account not only the variations of Gene Ontology categories in biological specificities and their significance to the gene clusters, but also the complex structure of the Gene Ontology. Finally, Bayesian probability is applied to making inference from heterogeneous genomic data, integrated with previous efforts in this thesis, for the aim of large-scale gene network inference. The proposed system comes with a stochastic process to achieve robustness to noise, yet remains efficient enough for large-scale analysis.
Ultimately, the solutions presented in this thesis serve as building blocks of an intelligent system for interpreting large-scale genomic data and understanding the functional organisation of the genome
- …