5 research outputs found
BioSilicoSystems - A Multipronged Approach Towards Analysis and Representation of Biological Data (PhD Thesis)
The rising field of integrative bioinformatics provides the vital methods to integrate, manage and also to analyze the diverse data and allows gaining new and deeper insights and a clear understanding of the intricate biological systems. The difficulty is not only to facilitate the study of heterogeneous data within the biological context, but it also more fundamental, how to represent and make the available knowledge accessible. Moreover, adding valuable information and functions that persuade the user to discover the interesting relations hidden within the data is, in itself, a great challenge. Also, the cumulative information can provide greater biological insight than is possible with individual information sources. Furthermore, the rapidly growing number of databases and data types poses the challenge of integrating the heterogeneous data types, especially in biology. This rapid increase in the volume and number of data resources drive for providing polymorphic views of the same data and often overlap in multiple resources. 

In this thesis a multi-pronged approach is proposed that deals with various methods for the analysis and representation of the diverse biological data which are present in different data sources. This is an effort to explain and emphasize on different concepts which are developed for the analysis of molecular data and also to explain its biological significance. The hypotheses proposed are in context with various other results and findings published in the past. The approach demonstrated also explains different ways to integrate the molecular data from various sources along with the need for a comprehensive understanding and clear projection of the concept or the algorithm and its results, but with simple means and methods. The multifarious approach proposed in this work comprises of different tools or methods spanning significant areas of bioinformatics research such as data integration, data visualization, biological network construction / reconstruction and alignment of biological pathways. Each tool deals with a unique approach to utilize the molecular data for different areas of biological research and is built based on the kernel of the thesis. Furthermore these methods are combined with graphical representation that make things simple and comprehensible and also helps to understand with ease the underlying biological complexity. Moreover the human eye is often used to and it is more comfortable with the visual representation of the facts
Developing a framework for semi-automated rule-based modelling for neuroscience research
Dynamic modelling has significantly improved our understanding of the complex
molecular mechanisms underpinning neurobiological processes. The detailed
mechanistic insights these models offer depend on the availability of
a diverse range of experimental observations. Despite the huge increase in
biomolecular data generation from novel high-throughput technologies and
extensive research in bioinformatics and dynamical modelling, efficient creation
of accurate dynamical models remains highly challenging. To study this
problem, three perspectives are considered: comparison of modelling methods,
prioritisation of results and analysis of primary data sets. Firstly, I compare two
models of the DARPP-32 signalling network: a classically defined model with
ordinary differential equations (ODE) and its equivalent, defined using a novel
rule-based (RB) paradigm. The RB model recapitulates the results of the ODE
model, but offers a more expressive and flexible syntax that can efficiently handle
the “combinatorial complexity” commonly found in signalling networks,
and allows ready access to fine-grain details of the emerging system. RB modelling
is particularly well suited to encoding protein-centred features such as
domain information and post-translational modification sites. Secondly, I propose
a new pipeline for prioritisation of molecular species that arise during
model simulation using a recently developed algorithm based on multivariate
mutual information (CorEx) coupled with global sensitivity analysis (GSA) using
the RKappa package. To efficiently evaluate the importance of parameters,
Hilber-Schmidt Independence Criterion (HSIC)-based indices are aggregated
into a weighted network that allows compact analysis of the model across conditions.
Finally, I describe an approach for the development of disease-specific
dynamical models using genes known to be associated with Attention Deficit
Hyperactivity Disorder (ADHD) as an exemplar. Candidate disease genes are
mapped to a selection of datasets that are potentially relevant to the modelling
process (e.g. interactions between proteins and domains, protein-domain and
kinase-substrates mappings) and these are jointly analysed using network clustering
and pathway enrichment analyses to evaluate their coverage and utility
in developing rule-based models
Machine learning approach to reconstructing signalling pathways and interaction networks in biology
In this doctoral thesis, I present my research into applying machine learning techniques
for reconstructing species interaction networks in ecology, reconstructing molecular
signalling pathways and gene regulatory networks in systems biology, and inferring
parameters in ordinary differential equation (ODE) models of signalling pathways.
Together, the methods I have developed for these applications demonstrate the usefulness
of machine learning for reconstructing networks and inferring network parameters
from data.
The thesis consists of three parts. The first part is a detailed comparison of applying
static Bayesian networks, relevance vector machines, and linear regression with L1
regularisation (LASSO) to the problem of reconstructing species interaction networks
from species absence/presence data in ecology (Faisal et al., 2010). I describe how I
generated data from a stochastic population model to test the different methods and
how the simulation study led us to introduce spatial autocorrelation as an important
covariate. I also show how we used the results of the simulation study to apply the
methods to presence/absence data of bird species from the European Bird Atlas.
The second part of the thesis describes a time-varying, non-homogeneous dynamic
Bayesian network model for reconstructing signalling pathways and gene regulatory
networks, based on L`ebre et al. (2010). I show how my work has extended this model
to incorporate different types of hierarchical Bayesian information sharing priors and
different coupling strategies among nodes in the network. The introduction of these
priors reduces the inference uncertainty by putting a penalty on the number of structure
changes among network segments separated by inferred changepoints (Dondelinger
et al., 2010; Husmeier et al., 2010; Dondelinger et al., 2012b). Using both synthetic
and real data, I demonstrate that using information sharing priors leads to a better reconstruction
accuracy of the underlying gene regulatory networks, and I compare the
different priors and coupling strategies. I show the results of applying the model to
gene expression datasets from Drosophila melanogaster and Arabidopsis thaliana, as
well as to a synthetic biology gene expression dataset from Saccharomyces cerevisiae.
In each case, the underlying network is time-varying; for Drosophila melanogaster, as
a consequence of measuring gene expression during different developmental stages;
for Arabidopsis thaliana, as a consequence of measuring gene expression for circadian
clock genes under different conditions; and for the synthetic biology dataset, as
a consequence of changing the growth environment. I show that in addition to inferring
sensible network structures, the model also successfully predicts the locations of changepoints.
The third and final part of this thesis is concerned with parameter inference in
ODE models of biological systems. This problem is of interest to systems biology
researchers, as kinetic reaction parameters can often not be measured, or can only be
estimated imprecisely from experimental data. Due to the cost of numerically solving
the ODE system after each parameter adaptation, this is a computationally challenging
problem. Gradient matching techniques circumvent this problem by directly fitting the
derivatives of the ODE to the slope of an interpolant. I present an inference procedure
for a model using nonparametric Bayesian statistics with Gaussian processes, based
on Calderhead et al. (2008). I show that the new inference procedure improves on
the original formulation in Calderhead et al. (2008) and I present the result of applying
it to ODE models of predator-prey interactions, a circadian clock gene, a signal
transduction pathway, and the JAK/STAT pathway