1,352 research outputs found
Non-equilibrium phase transitions in biomolecular signal transduction
We study a mechanism for reliable switching in biomolecular
signal-transduction cascades. Steady bistable states are created by system-size
cooperative effects in populations of proteins, in spite of the fact that the
phosphorylation-state transitions of any molecule, by means of which the switch
is implemented, are highly stochastic. The emergence of switching is a
nonequilibrium phase transition in an energetically driven, dissipative system
described by a master equation. We use operator and functional integral methods
from reaction-diffusion theory to solve for the phase structure, noise
spectrum, and escape trajectories and first-passage times of a class of minimal
models of switches, showing how all critical properties for switch behavior can
be computed within a unified framework
Leveraging expression and network data for protein function prediction
2012 Summer.Includes bibliographical references.Protein function prediction is one of the prominent problems in bioinformatics today. Protein annotation is slowly falling behind as more and more genomes are being sequenced. Experimental methods are expensive and time consuming, which leaves computational methods to fill the gap. While computational methods are still not accurate enough to be used without human supervision, this is the goal. The Gene Ontology (GO) is a collection of terms that are the standard for protein function annotations. Because of the structure of GO, protein function prediction is a hierarchical multi-label classification problem. The classification method used in this thesis is GOstruct, which performs structured predictions that take into account all GO terms. GOstruct has been shown to work well, but there are still improvements to be made. In this thesis, I work to improve predictions by building new kernels from the data that are used by GOstruct. To do this, I find key representations of the data that help define what kernels perform best on the variety of data types. I apply this methodology to function prediction in two model organisms, Saccharomyces cerevisiae and Mus musculus, and found better methods for interpreting the data
Human protein function prediction: application of machine learning for integration of heterogeneous data sources
Experimental characterisation of protein cellular function can be prohibitively expensive and
take years to complete. To address this problem, this thesis focuses on the development of computational
approaches to predict function from sequence. For sequences with well characterised
close relatives, annotation is trivial, orphans or distant homologues present a greater challenge.
The use of a feature based method employing ensemble support vector machines to predict individual
Gene Ontology classes is investigated. It is found that different combinations of feature
inputs are required to recognise different functions. Although the approach is applicable to any
human protein sequence, it is restricted to broadly descriptive functions. The method is well
suited to prioritisation of candidate functions for novel proteins rather than to make highly accurate
class assignments.
Signatures of common function can be derived from different biological characteristics; interactions
and binding events as well as expression behaviour. To investigate the hypothesis that
common function can be derived from expression information, public domain human microarray
datasets are assembled. The questions of how best to integrate these datasets and derive
features that are useful in function prediction are addressed. Both co-expression and abundance
information is represented between and within experiments and investigated for correlation with
function. It is found that features derived from expression data serve as a weak but significant
signal for recognising functions. This signal is stronger for biological processes than molecular
function categories and independent of homology information.
The protein domain has historically been coined as a modular evolutionary unit of protein function.
The occurrence of domains that can be linked by ancestral fusion events serves as a signal
for domain-domain interactions. To exploit this information for function prediction, novel domain
architecture and fused architecture scores are developed. Architecture scores rather than
single domain scores correlate more strongly with function, and both architecture and fusion
scores correlate more strongly with molecular functions than biological processes. The final study details the development of a novel heterogeneous function prediction approach
designed to target the annotation of both homologous and non-homologous proteins. Support
vector regression is used to combine pair-wise sequence features with expression scores and
domain architecture scores to rank protein pairs in terms of their functional similarities. The
target of the regression models represents the continuum of protein function space empirically
derived from the Gene Ontology molecular function and biological process graphs. The merit
and performance of the approach is demonstrated using homologous and non-homologous test
datasets and significantly improves upon classical nearest neighbour annotation transfer by sequence
methods. The final model represents a method that achieves a compromise between
high specificity and sensitivity for all human proteins regardless of their homology status. It is
expected that this strategy will allow for more comprehensive and accurate annotations of the
human proteome
White matter differences between healthy young ApoE4 carriers and non-carriers identified with tractography and support vector machines.
The apolipoprotein E4 (ApoE4) is an established risk factor for Alzheimer's disease (AD). Previous work has shown that this allele is associated with functional (fMRI) changes as well structural grey matter (GM) changes in healthy young, middle-aged and older subjects. Here, we assess the diffusion characteristics and the white matter (WM) tracts of healthy young (20-38 years) ApoE4 carriers and non-carriers. No significant differences in diffusion indices were found between young carriers (ApoE4+) and non-carriers (ApoE4-). There were also no significant differences between the groups in terms of normalised GM or WM volume. A feature selection algorithm (ReliefF) was used to select the most salient voxels from the diffusion data for subsequent classification with support vector machines (SVMs). SVMs were capable of classifying ApoE4 carrier and non-carrier groups with an extremely high level of accuracy. The top 500 voxels selected by ReliefF were then used as seeds for tractography which identified a WM network that included regions of the parietal lobe, the cingulum bundle and the dorsolateral frontal lobe. There was a non-significant decrease in volume of this WM network in the ApoE4 carrier group. Our results indicate that there are subtle WM differences between healthy young ApoE4 carriers and non-carriers and that the WM network identified may be particularly vulnerable to further degeneration in ApoE4 carriers as they enter middle and old age
Multi-Scale Fluctuations in Non-Equilibrium Systems: Statistical Physics and Biological Application
Understanding how fluctuations continuously propagate across spatial scales is fundamental for our understanding of inanimate matter. This is exemplified by self-similar fluctuations in critical phenomena and the propagation of energy fluctuations described by the Kolmogorov-Law in turbulence. Our understanding is based on powerful theoretical frameworks that integrate fluctuations on intermediary scales, as in renormalisation group or coupled mode theory. In striking contrast to typical inanimate systems, living matter is typically organised into a hierarchy of processes on a discrete set of spatial scales: from biochemical processes embedded in dynamic subcellular compartments to cells giving rise to tissues. Therefore, the understanding of living matter requires novel theories that predict the interplay of fluctuations on multiple scales of biological organisation and the ensuing emergent degrees of freedom.
In this thesis, we derive a general theory of the multi-scale propagation of fluctuations in non-equilibrium systems and show that such processes underlie the regulation of cellular behaviour. Specifically, we draw on paradigmatic systems comprising stochastic many-particle systems undergoing dynamic compartmentalisation.
We first derive a theory for emergent degrees of freedom in open systems, where the total mass is not conserved. We show that the compartment dynamics give rise to the localisation of probability densities in phase space resembling quasi-particle behaviour. This emergent quasi-particle exhibits fundamentally different response kinetics and steady states compared to systems lacking compartment dynamics. In order to investigate a potential biological function of such quasi-particle dynamics, we then apply this theory to the regulation of cell death. We derive a model describing the subcellular processes that regulate cell death and show that the quasi-particle dynamics gives rise to a kinetic low-pass filter which suppresses the response of the cell to fast fluituations in cellular stress signals. We test our predictions experimentally by quantifying cell death in cell cultures subject to stress stimuli varying in strength and duration.
In closed systems, where the total mass is conserved, the effect of dynamic compartmentalisation depends on details of the kinetics on the scale of the stochastic many-particle dynamics. Using a second quantisation approach, we derive a commutator relation between the kinetic operators and the change in total entropy. Drawing on this, we show that the compartment dynamics alters the total entropy if the kinetics of the stochastic many-particle dynamics violate detailed balance. We apply this mechanism to the activation of cellular immune responses to RNA-virus infections. We show that dynamic compartmentalisation in closed systems gives rise to giant density fluctuations. This facilitates the emergence of gelation under conditions that violate theoretical gelation criteria in the absence of compartment dynamics. We show that such multi-scale gelation of protein complexes on the membranes of dynamic mitochondria governs the innate immune response.
Taken together, we provide a general theory describing the multi-scale propagation of fluctuations in biological systems. Our work pioneers the development of a statistical physics of such systems and highlights emergent degrees of freedom spanning different scales of biological organisation. By demonstrating that cells manipulate how fluctuations propagate across these scales, our work motivates a rethinking of how the behaviour of cells is regulated
The impact of temporal sampling resolution on parameter inference for biological transport models
Imaging data has become widely available to study biological systems at
various scales, for example the motile behaviour of bacteria or the transport
of mRNA, and it has the potential to transform our understanding of key
transport mechanisms. Often these imaging studies require us to compare
biological species or mutants, and to do this we need to quantitatively
characterise their behaviour. Mathematical models offer a quantitative
description of a system that enables us to perform this comparison, but to
relate these mechanistic mathematical models to imaging data, we need to
estimate the parameters of the models. In this work, we study the impact of
collecting data at different temporal resolutions on parameter inference for
biological transport models by performing exact inference for simple velocity
jump process models in a Bayesian framework. This issue is prominent in a host
of studies because the majority of imaging technologies place constraints on
the frequency with which images can be collected, and the discrete nature of
observations can introduce errors into parameter estimates. In this work, we
avoid such errors by formulating the velocity jump process model within a
hidden states framework. This allows us to obtain estimates of the
reorientation rate and noise amplitude for noisy observations of a simple
velocity jump process. We demonstrate the sensitivity of these estimates to
temporal variations in the sampling resolution and extent of measurement noise.
We use our methodology to provide experimental guidelines for researchers
aiming to characterise motile behaviour that can be described by a velocity
jump process. In particular, we consider how experimental constraints resulting
in a trade-off between temporal sampling resolution and observation noise may
affect parameter estimates.Comment: Published in PLOS Computational Biolog
Graph Theoretic and Pearson Correlation-Based Discovery of Network Biomarkers for Cancer
Two graph theoretic concepts—clique and bipartite graphs—are explored to identify the network biomarkers for cancer at the gene network level. The rationale is that a group of genes work together by forming a cluster or a clique-like structures to initiate a cancer. After initiation, the disease signal goes to the next group of genes related to the second stage of a cancer, which can be represented as a bipartite graph. In other words, bipartite graphs represent the cross-talk among the genes between two disease stages. To prove this hypothesis, gene expression values for three cancers— breast invasive carcinoma (BRCA), colorectal adenocarcinoma (COAD) and glioblastoma multiforme (GBM)—are used for analysis. First, a co-expression gene network is generated with highly correlated gene pairs with a Pearson correlation coefficient ≥ 0.9. Second, clique structures of all sizes are isolated from the co-expression network. Then combining these cliques, three different biomarker modules are developed—maximal clique-like modules, 2-clique-1-bipartite modules, and 3-clique-2-bipartite modules. The list of biomarker genes discovered from these network modules are validated as the essential genes for causing a cancer in terms of network properties and survival analysis. This list of biomarker genes will help biologists to design wet lab experiments for further elucidating the complex mechanism of cancer
Integrative Data Analytic Framework to Enhance Cancer Precision Medicine
With the advancement of high-throughput biotechnologies, we increasingly
accumulate biomedical data about diseases, especially cancer. There is a need
for computational models and methods to sift through, integrate, and extract
new knowledge from the diverse available data to improve the mechanistic
understanding of diseases and patient care. To uncover molecular mechanisms and
drug indications for specific cancer types, we develop an integrative framework
able to harness a wide range of diverse molecular and pan-cancer data. We show
that our approach outperforms competing methods and can identify new
associations. Furthermore, through the joint integration of data sources, our
framework can also uncover links between cancer types and molecular entities
for which no prior knowledge is available. Our new framework is flexible and
can be easily reformulated to study any biomedical problems.Comment: 18 page
- …