253 research outputs found
Gene Regulatory Network Reconstruction Using Dynamic Bayesian Networks
High-content technologies such as DNA microarrays can provide a system-scale overview of how genes interact with each other in a network context. Various mathematical methods and computational approaches have been proposed to reconstruct GRNs, including Boolean networks, information theory, differential equations and Bayesian networks. GRN reconstruction faces huge intrinsic challenges on both experimental and theoretical fronts, because the inputs and outputs of the molecular processes are unclear and the underlying principles are unknown or too complex.
In this work, we focused on improving the accuracy and speed of GRN reconstruction with Dynamic Bayesian based method. A commonly used structure-learning algorithm is based on REVEAL (Reverse Engineering Algorithm). However, this method has some limitations when it is used for reconstructing GRNs. For instance, the two-stage temporal Bayes network (2TBN) cannot be well recovered by application of REVEAL; it has low accuracy and speed for high dimensionality networks that has above a hundred nodes; and it even cannot accomplish the task of reconstructing a network with 400 nodes. We implemented an algorithm for DBN structure learning with Friedman\u27s score function to replace REVEAL, and tested it on reconstruction of both synthetic networks and real yeast networks and compared it with REVEAL in the absence or presence of preprocessed network generated by Zou and Conzen\u27s algorithm. The new score metric improved the precision and recall of GRN reconstruction. Networks of gene interactions were reconstructed using a Dynamic Bayesian Network (DBN) approach and were analyzed to identify the mechanism of chemical-induced reversible neurotoxicity through reconstruction of gene regulatory networks in earthworms with tools curating relevant genes from non-model organism\u27s pathway to model organism pathway
Constrained expectation-maximization (EM), dynamic analysis, linear quadratic tracking, and nonlinear constrained expectation-maximation (EM) for the analysis of genetic regulatory networks and signal transduction networks
Despite the immense progress made by molecular biology in cataloging andcharacterizing molecular elements of life and the success in genome sequencing, therehave not been comparable advances in the functional study of complex phenotypes.This is because isolated study of one molecule, or one gene, at a time is not enough byitself to characterize the complex interactions in organism and to explain the functionsthat arise out of these interactions. Mathematical modeling of biological systems isone way to meet the challenge.My research formulates the modeling of gene regulation as a control problem andapplies systems and control theory to the identification, analysis, and optimal controlof genetic regulatory networks. The major contribution of my work includes biologicallyconstrained estimation, dynamical analysis, and optimal control of genetic networks.In addition, parameter estimation of nonlinear models of biological networksis also studied, as a parameter estimation problem of a general nonlinear dynamicalsystem. Results demonstrate the superior predictive power of biologically constrainedstate-space models, and that genetic networks can have differential dynamic propertieswhen subjected to different environmental perturbations. Application of optimalcontrol demonstrates feasibility of regulating gene expression levels. In the difficultproblem of parameter estimation, generalized EM algorithm is deployed, and a set of explicit formula based on extended Kalman filter is derived. Application of themethod to synthetic and real world data shows promising results
Dynamical pathway analysis
<p>Abstract</p> <p>Background</p> <p>Although a great deal is known about one gene or protein and its functions under different environmental conditions, little information is available about the complex behaviour of biological networks subject to different environmental perturbations. Observing differential expressions of one or more genes between normal and abnormal cells has been a mainstream method of discovering pertinent genes in diseases and therefore valuable drug targets. However, to date, no such method exists for elucidating and quantifying the differential dynamical behaviour of genetic regulatory networks, which can have greater impact on phenotypes than individual genes.</p> <p>Results</p> <p>We propose to redress the deficiency by formulating the functional study of biological networks as a control problem of dynamical systems. We developed mathematical methods to study the stability, the controllability, and the steady-state behaviour, as well as the transient responses of biological networks under different environmental perturbations. We applied our framework to three real-world datasets: the SOS DNA repair network in <it>E. coli </it>under different dosages of radiation, the GSH redox cycle in mice lung exposed to either poisonous air or normal air, and the MAPK pathway in mammalian cell lines exposed to three types of HIV type I Vpr, a wild type and two mutant types; and we found that the three genetic networks exhibited fundamentally different dynamical properties in normal and abnormal cells.</p> <p>Conclusion</p> <p>Difference in stability, relative stability, degrees of controllability, and transient responses between normal and abnormal cells means considerable difference in dynamical behaviours and different functioning of cells. Therefore differential dynamical properties can be a valuable tool in biomedical research.</p
Efficient and Robust Algorithms for Statistical Inference in Gene Regulatory Networks
Inferring gene regulatory networks (GRNs) is of profound importance in the field of computational
biology and bioinformatics. Understanding the gene-gene and gene- transcription factor (TF)
interactions has the potential of providing an insight into the complex biological processes
taking place in cells. High-throughput genomic and proteomic technologies have enabled the
collection of large amounts of data in order to quantify the gene expressions and mapping
DNA-protein interactions.
This dissertation investigates the problem of network component analysis (NCA) which estimates
the transcription factor activities (TFAs) and gene-TF interactions by making use of gene
expression and Chip-chip data. Closed-form solutions are provided for estimation of TF-gene
connectivity matrix which yields advantage over the existing state-of-the-art methods in terms
of lower computational complexity and higher consistency. We present an iterative reweighted ℓ2
norm based algorithm to infer the network connectivity when the prior knowledge about the connections is
incomplete.
We present an NCA algorithm which has the ability to counteract the presence of outliers in the gene expression data and is therefore more robust. Closed-form solutions are derived for the estimation of TFAs and TF-gene interactions and the resulting algorithm is comparable to the fastest algorithms proposed so far with the additional advantages of robustness to outliers and higher reliability in the TFA estimation.
Finally, we look at the inference of gene regulatory networks which which essentially resumes to the estimation of only the gene-gene interactions. Gene networks are known to be sparse and therefore an inference algorithm is proposed which imposes a sparsity constraint while estimating the connectivity matrix.The online estimation lowers the computational complexity and provides superior performance in terms of accuracy and scalability.
This dissertation presents gene regulatory network inference algorithms which provide
computationally efficient solutions in some very crucial scenarios and give advantage over the
existing algorithms and therefore provide means to give better understanding of underlying
cellular network. Hence, it serves as a building block in the accurate estimation of gene
regulatory networks which will pave the way for
finding cures to genetic diseases
Microarray Data Mining and Gene Regulatory Network Analysis
The novel molecular biological technology, microarray, makes it feasible to obtain quantitative measurements of expression of thousands of genes present in a biological sample simultaneously. Genome-wide expression data generated from this technology are promising to uncover the implicit, previously unknown biological knowledge. In this study, several problems about microarray data mining techniques were investigated, including feature(gene) selection, classifier genes identification, generation of reference genetic interaction network for non-model organisms and gene regulatory network reconstruction using time-series gene expression data. The limitations of most of the existing computational models employed to infer gene regulatory network lie in that they either suffer from low accuracy or computational complexity. To overcome such limitations, the following strategies were proposed to integrate bioinformatics data mining techniques with existing GRN inference algorithms, which enables the discovery of novel biological knowledge. An integrated statistical and machine learning (ISML) pipeline was developed for feature selection and classifier genes identification to solve the challenges of the curse of dimensionality problem as well as the huge search space. Using the selected classifier genes as seeds, a scale-up technique is applied to search through major databases of genetic interaction networks, metabolic pathways, etc.
By curating relevant genes and blasting genomic sequences of non-model organisms against well-studied genetic model organisms, a reference gene regulatory network for less-studied organisms was built and used both as prior knowledge and model validation for GRN reconstructions. Networks of gene interactions were inferred using a Dynamic Bayesian Network (DBN) approach and were analyzed for elucidating the dynamics caused by perturbations. Our proposed pipelines were applied to investigate molecular mechanisms for chemical-induced reversible neurotoxicity
Data analysis methods for copy number discovery and interpretation
Copy
number
variation
(CNV)
is
an
important
type
of
genetic
variation
that
can
give
rise
to
a
wide
variety
of
phenotypic
traits.
Differences
in
copy
number
are
thought
to
play
major
roles
in
processes
that
involve
dosage
sensitive
genes,
providing
beneficial,
deleterious
or
neutral
modifications
to
individual
phenotypes.
Copy
number
analysis
has
long
been
a
standard
in
clinical
cytogenetic
laboratories.
Gene
deletions
and
duplications
can
often
be
linked
with
genetic
Syndromes
such
as:
the
7q11.23
deletion
of
Williams-‐Bueren
Syndrome,
the
22q11
deletion
of
DiGeorge
syndrome
and
the
17q11.2
duplication
of
Potocki-‐Lupski
syndrome.
Interestingly,
copy
number
based
genomic
disorders
often
display
reciprocal
deletion
/
duplication
syndromes,
with
the
latter
frequently
exhibiting
milder
symptoms.
Moreover,
the
study
of
chromosomal
imbalances
plays
a
key
role
in
cancer
research.
The
datasets
used
for
the
development
of
analysis
methods
during
this
project
are
generated
as
part
of
the
cutting-‐edge
translational
project,
Deciphering
Developmental
Disorders
(DDD).
This
project,
the
DDD,
is
the
first
of
its
kind
and
will
directly
apply
state
of
the
art
technologies,
in
the
form
of
ultra-‐high
resolution
microarray
and
next
generation
sequencing
(NGS),
to
real-‐time
genetic
clinical
practice.
It
is
collaboration
between
the
Wellcome
Trust
Sanger
Institute
(WTSI)
and
the
National
Health
Service
(NHS)
involving
the
24
regional
genetic
services
across
the
UK
and
Ireland.
Although
the
application
of
DNA
microarrays
for
the
detection
of
CNVs
is
well
established,
individual
change
point
detection
algorithms
often
display
variable
performances.
The
definition
of
an
optimal
set
of
parameters
for
achieving
a
certain
level
of
performance
is
rarely
straightforward,
especially
where
data
qualities
vary ... [cont.]
Efficient and Robust Algorithms for Statistical Inference in Gene Regulatory Networks
Inferring gene regulatory networks (GRNs) is of profound importance in the field of computational
biology and bioinformatics. Understanding the gene-gene and gene- transcription factor (TF)
interactions has the potential of providing an insight into the complex biological processes
taking place in cells. High-throughput genomic and proteomic technologies have enabled the
collection of large amounts of data in order to quantify the gene expressions and mapping
DNA-protein interactions.
This dissertation investigates the problem of network component analysis (NCA) which estimates
the transcription factor activities (TFAs) and gene-TF interactions by making use of gene
expression and Chip-chip data. Closed-form solutions are provided for estimation of TF-gene
connectivity matrix which yields advantage over the existing state-of-the-art methods in terms
of lower computational complexity and higher consistency. We present an iterative reweighted ℓ2
norm based algorithm to infer the network connectivity when the prior knowledge about the connections is
incomplete.
We present an NCA algorithm which has the ability to counteract the presence of outliers in the gene expression data and is therefore more robust. Closed-form solutions are derived for the estimation of TFAs and TF-gene interactions and the resulting algorithm is comparable to the fastest algorithms proposed so far with the additional advantages of robustness to outliers and higher reliability in the TFA estimation.
Finally, we look at the inference of gene regulatory networks which which essentially resumes to the estimation of only the gene-gene interactions. Gene networks are known to be sparse and therefore an inference algorithm is proposed which imposes a sparsity constraint while estimating the connectivity matrix.The online estimation lowers the computational complexity and provides superior performance in terms of accuracy and scalability.
This dissertation presents gene regulatory network inference algorithms which provide
computationally efficient solutions in some very crucial scenarios and give advantage over the
existing algorithms and therefore provide means to give better understanding of underlying
cellular network. Hence, it serves as a building block in the accurate estimation of gene
regulatory networks which will pave the way for
finding cures to genetic diseases
Graphical models for estimating dynamic networks
Het bepalen van dynamische netwerken met behulp van data is een actief onderzoeksgebied, met name in de systeem biologie. Het schatten van de structuur van een netwerk heeft te maken met het bepalen van de aan of afwezigheid van een relatie tussen de hoekpunten in de graaf. Grafische modellen definiëren deze relaties via conditionele afhankelijkheid. In Gaussiaanse grafische modellen (GGM) wordt verondersteld dat de hoekpunten een normale verdeling volgen. Dit heeft grote voordelen vanwege de computationele handelbaarheid van GGM. Standaard GGM zijn echter niet bruikbaar om grote netwerken te bestuderen, i.e. als het aantal waarnemingen minder is dan het aantal hoekpunten van de graaf. Recentelijk zijn bestrafde meest waarschijnlijke schatters voorgesteld om toch met hoog-dimensionale situaties om te kunnen gaan. We stellen voor om bestrafde GGM te gebruiken in een aantal verschillende contexten: voor gestruktureerde dynamische modellen, voor langzaam veranderende dynamische modellen en voor modellen met een bepaalde structuur, zoals bijvoorbeeld met een “kleine wereld” architectuur. Elk van deze modellen kan worden toegepast in echte, hoog-dimensionale situaties waar de ontwikkeling van het netwerk een belangrijke rol speelt. Zodra het onderliggend process op de hoekpunten binaire variabelen, ordinale variabelen, tellingen of op andere wijze niet-normale data zijn, stellen we in dit proefschrift voor om via een Gaussiaanse copula een algemeen niet-Gaussiaanse grafisch model te definiëren. Deze copula transformeert de data of direct via de marginale verdelingsfunctie van de variabelen, of indirect via een latente normale variabelen. Deze aanpak is zeer successful, met name omdat het op eenvoudige wijze variabelen van verschillende typen samen kan modelleren in een grafisch model. Het probleem van het schatten van een dynamisch network wordt nog moeilijker als een bepaald deel van de hoekpunten niet waargenomen zijn. In zulke gevallen worden typisch state-space modellen gebruikt, maar hier stellen we voor om een uitbreiding van onze bestrafde grafische model te gebruiken om het latente deel van het netwerk te schatten
- …