1,261 research outputs found
Information processing in biology
To survive, organisms must respond appropriately to a variety of challenges posed by a dynamic and uncertain environment. The mechanisms underlying such responses can in general be framed as input-output devices which map environment states (inputs) to associated responses (output. In this light, it is appealing to attempt to model these systems using information theory, a well developed mathematical framework to describe input-output systems.
Under the information theoretical perspective, an organism’s behavior is fully characterized by the repertoire of its outputs under different environmental conditions. Due to natural selection, it is reasonable to assume this input-output mapping has been fine tuned in such a way as to maximize the organism’s fitness. If that is the case, it should be possible to abstract away the mechanistic implementation details and obtain the general principles that lead to fitness under a certain environment. These can then be used inferentially to both generate hypotheses about the underlying implementation as well as predict novel responses under external perturbations.
In this work I use information theory to address the question of how biological systems generate complex outputs using relatively simple mechanisms in a robust manner. In particular, I will examine how communication and distributed processing can lead to emergent phenomena which allow collective systems to respond in a much richer way than a single organism could
Information processing in biology
To survive, organisms must respond appropriately to a variety of challenges posed by a dynamic and uncertain environment. The mechanisms underlying such responses can in general be framed as input-output devices which map environment states (inputs) to associated responses (output. In this light, it is appealing to attempt to model these systems using information theory, a well developed mathematical framework to describe input-output systems.
Under the information theoretical perspective, an organism’s behavior is fully characterized by the repertoire of its outputs under different environmental conditions. Due to natural selection, it is reasonable to assume this input-output mapping has been fine tuned in such a way as to maximize the organism’s fitness. If that is the case, it should be possible to abstract away the mechanistic implementation details and obtain the general principles that lead to fitness under a certain environment. These can then be used inferentially to both generate hypotheses about the underlying implementation as well as predict novel responses under external perturbations.
In this work I use information theory to address the question of how biological systems generate complex outputs using relatively simple mechanisms in a robust manner. In particular, I will examine how communication and distributed processing can lead to emergent phenomena which allow collective systems to respond in a much richer way than a single organism could
Graphical Models for Multivariate Time-Series
Gaussian graphical models have received much attention in the last years, due
to their flexibility and expression power. In particular, lots of interests have
been devoted to graphical models for temporal data, or dynamical graphical
models, to understand the relation of variables evolving in time. While powerful
in modelling complex systems, such models suffer from computational
issues both in terms of convergence rates and memory requirements, and may
fail to detect temporal patterns in case the information on the system is partial.
This thesis comprises two main contributions in the context of dynamical
graphical models, tackling these two aspects: the need of reliable and fast
optimisation methods and an increasing modelling power, which are able to
retrieve the model in practical applications. The first contribution consists in a
forward-backward splitting (FBS) procedure for Gaussian graphical modelling
of multivariate time-series which relies on recent theoretical studies ensuring
global convergence under mild assumptions. Indeed, such FBS-based implementation
achieves, with fast convergence rates, optimal results with respect
to ground truth and standard methods for dynamical network inference. The
second main contribution focuses on the problem of latent factors, that influence
the system while hidden or unobservable. This thesis proposes the novel
latent variable time-varying graphical lasso method, which is able to take into
account both temporal dynamics in the data and latent factors influencing
the system. This is fundamental for the practical use of graphical models,
where the information on the data is partial. Indeed, extensive validation of
the method on both synthetic and real applications shows the effectiveness of
considering latent factors to deal with incomplete information
The effect of noise on dynamics and the influence of biochemical systems
Understanding a complex system requires integration and collective analysis of data from many
levels of organisation. Predictive modelling of biochemical systems is particularly challenging
because of the nature of data being plagued by noise operating at each and every level. Inevitably
we have to decide whether we can reliably infer the structure and dynamics of biochemical systems
from present data. Here we approach this problem from many fronts by analysing the interplay
between deterministic and stochastic dynamics in a broad collection of biochemical models.
In a classical mathematical model we first illustrate how this interplay can be described in
surprisingly simple terms; we furthermore demonstrate the advantages of a statistical point of view
also for more complex systems. We then investigate strategies for the integrated analysis of models
characterised by different organisational levels, and trace the propagation of noise through such
systems. We use this approach to uncover, for the first time, the dynamics of metabolic adaptation
of a plant pathogen throughout its life cycle and discuss the ecological implications.
Finally, we investigate how reliably we can infer model parameters of biochemical models.
We develop a novel sensitivity/inferability analysis framework that is generally applicable to a
large fraction of current mathematical models of biochemical systems. By using this framework to
quantify the effect of parametric variation on system dynamics, we provide practical guidelines as
to when and why certain parameters are easily estimated while others are much harder to infer. We
highlight the limitations on parameter inference due to model structure and qualitative dynamical
behaviour, and identify candidate elements of control in biochemical pathways most likely of being
subjected to regulation
Machine learning approach to reconstructing signalling pathways and interaction networks in biology
In this doctoral thesis, I present my research into applying machine learning techniques
for reconstructing species interaction networks in ecology, reconstructing molecular
signalling pathways and gene regulatory networks in systems biology, and inferring
parameters in ordinary differential equation (ODE) models of signalling pathways.
Together, the methods I have developed for these applications demonstrate the usefulness
of machine learning for reconstructing networks and inferring network parameters
from data.
The thesis consists of three parts. The first part is a detailed comparison of applying
static Bayesian networks, relevance vector machines, and linear regression with L1
regularisation (LASSO) to the problem of reconstructing species interaction networks
from species absence/presence data in ecology (Faisal et al., 2010). I describe how I
generated data from a stochastic population model to test the different methods and
how the simulation study led us to introduce spatial autocorrelation as an important
covariate. I also show how we used the results of the simulation study to apply the
methods to presence/absence data of bird species from the European Bird Atlas.
The second part of the thesis describes a time-varying, non-homogeneous dynamic
Bayesian network model for reconstructing signalling pathways and gene regulatory
networks, based on L`ebre et al. (2010). I show how my work has extended this model
to incorporate different types of hierarchical Bayesian information sharing priors and
different coupling strategies among nodes in the network. The introduction of these
priors reduces the inference uncertainty by putting a penalty on the number of structure
changes among network segments separated by inferred changepoints (Dondelinger
et al., 2010; Husmeier et al., 2010; Dondelinger et al., 2012b). Using both synthetic
and real data, I demonstrate that using information sharing priors leads to a better reconstruction
accuracy of the underlying gene regulatory networks, and I compare the
different priors and coupling strategies. I show the results of applying the model to
gene expression datasets from Drosophila melanogaster and Arabidopsis thaliana, as
well as to a synthetic biology gene expression dataset from Saccharomyces cerevisiae.
In each case, the underlying network is time-varying; for Drosophila melanogaster, as
a consequence of measuring gene expression during different developmental stages;
for Arabidopsis thaliana, as a consequence of measuring gene expression for circadian
clock genes under different conditions; and for the synthetic biology dataset, as
a consequence of changing the growth environment. I show that in addition to inferring
sensible network structures, the model also successfully predicts the locations of changepoints.
The third and final part of this thesis is concerned with parameter inference in
ODE models of biological systems. This problem is of interest to systems biology
researchers, as kinetic reaction parameters can often not be measured, or can only be
estimated imprecisely from experimental data. Due to the cost of numerically solving
the ODE system after each parameter adaptation, this is a computationally challenging
problem. Gradient matching techniques circumvent this problem by directly fitting the
derivatives of the ODE to the slope of an interpolant. I present an inference procedure
for a model using nonparametric Bayesian statistics with Gaussian processes, based
on Calderhead et al. (2008). I show that the new inference procedure improves on
the original formulation in Calderhead et al. (2008) and I present the result of applying
it to ODE models of predator-prey interactions, a circadian clock gene, a signal
transduction pathway, and the JAK/STAT pathway
Untangling hotel industry’s inefficiency: An SFA approach applied to a renowned Portuguese hotel chain
The present paper explores the technical efficiency of four hotels from Teixeira Duarte Group - a renowned Portuguese hotel chain. An efficiency ranking is established from these four hotel units located in Portugal using Stochastic Frontier Analysis. This methodology allows to discriminate between measurement error and systematic inefficiencies in the estimation process enabling to investigate the main inefficiency causes. Several suggestions concerning efficiency improvement are undertaken for each hotel studied.info:eu-repo/semantics/publishedVersio
Differential geometric MCMC methods and applications
This thesis presents novel Markov chain Monte Carlo methodology that exploits the natural representation of a statistical model as a Riemannian manifold. The methods developed provide generalisations of the Metropolis-adjusted Langevin algorithm and the Hybrid Monte Carlo algorithm for Bayesian statistical inference, and resolve many shortcomings of existing Monte Carlo algorithms when sampling from target densities that may be high dimensional and exhibit strong correlation structure. The performance of these Riemannian manifold Markov chain Monte Carlo algorithms is rigorously assessed by performing Bayesian inference on logistic regression models, log-Gaussian Cox point process models, stochastic volatility models, and both parameter and model level inference of dynamical systems described by nonlinear differential equations
Investigating hybrids of evolution and learning for real-parameter optimization
In recent years, more and more advanced techniques have been developed in the field
of hybridizing of evolution and learning, this means that more applications with these techniques
can benefit from this progress. One example of these advanced techniques is the
Learnable Evolution Model (LEM), which adopts learning as a guide for the general evolutionary
search. Despite this trend and the progress in LEM, there are still many ideas and
attempts which deserve further investigations and tests. For this purpose, this thesis has
developed a number of new algorithms attempting to combine more learning algorithms
with evolution in different ways. With these developments, we expect to understand the
effects and relations between evolution and learning, and also achieve better performances
in solving complex problems.
The machine learning algorithms combined into the standard Genetic Algorithm (GA)
are the supervised learning method k-nearest-neighbors (KNN), the Entropy-Based Discretization
(ED) method, and the decision tree learning algorithm ID3. We test these algorithms
on various real-parameter function optimization problems, especially the functions
in the special session on CEC 2005 real-parameter function optimization. Additionally, a
medical cancer chemotherapy treatment problem is solved in this thesis by some of our
hybrid algorithms.
The performances of these algorithms are compared with standard genetic algorithms
and other well-known contemporary evolution and learning hybrid algorithms. Some of
them are the CovarianceMatrix Adaptation Evolution Strategies (CMAES), and variants of
the Estimation of Distribution Algorithms (EDA).
Some important results have been derived from our experiments on these developed algorithms.
Among them, we found that even some very simple learning methods hybridized
properly with evolution procedure can provide significant performance improvement; and
when more complex learning algorithms are incorporated with evolution, the resulting algorithms
are very promising and compete very well against the state of the art hybrid algorithms
both in well-defined real-parameter function optimization problems and a practical
evaluation-expensive problem
- …