1,352 research outputs found

    Non-equilibrium phase transitions in biomolecular signal transduction

    Full text link
    We study a mechanism for reliable switching in biomolecular signal-transduction cascades. Steady bistable states are created by system-size cooperative effects in populations of proteins, in spite of the fact that the phosphorylation-state transitions of any molecule, by means of which the switch is implemented, are highly stochastic. The emergence of switching is a nonequilibrium phase transition in an energetically driven, dissipative system described by a master equation. We use operator and functional integral methods from reaction-diffusion theory to solve for the phase structure, noise spectrum, and escape trajectories and first-passage times of a class of minimal models of switches, showing how all critical properties for switch behavior can be computed within a unified framework

    Leveraging expression and network data for protein function prediction

    Get PDF
    2012 Summer.Includes bibliographical references.Protein function prediction is one of the prominent problems in bioinformatics today. Protein annotation is slowly falling behind as more and more genomes are being sequenced. Experimental methods are expensive and time consuming, which leaves computational methods to fill the gap. While computational methods are still not accurate enough to be used without human supervision, this is the goal. The Gene Ontology (GO) is a collection of terms that are the standard for protein function annotations. Because of the structure of GO, protein function prediction is a hierarchical multi-label classification problem. The classification method used in this thesis is GOstruct, which performs structured predictions that take into account all GO terms. GOstruct has been shown to work well, but there are still improvements to be made. In this thesis, I work to improve predictions by building new kernels from the data that are used by GOstruct. To do this, I find key representations of the data that help define what kernels perform best on the variety of data types. I apply this methodology to function prediction in two model organisms, Saccharomyces cerevisiae and Mus musculus, and found better methods for interpreting the data

    Human protein function prediction: application of machine learning for integration of heterogeneous data sources

    Get PDF
    Experimental characterisation of protein cellular function can be prohibitively expensive and take years to complete. To address this problem, this thesis focuses on the development of computational approaches to predict function from sequence. For sequences with well characterised close relatives, annotation is trivial, orphans or distant homologues present a greater challenge. The use of a feature based method employing ensemble support vector machines to predict individual Gene Ontology classes is investigated. It is found that different combinations of feature inputs are required to recognise different functions. Although the approach is applicable to any human protein sequence, it is restricted to broadly descriptive functions. The method is well suited to prioritisation of candidate functions for novel proteins rather than to make highly accurate class assignments. Signatures of common function can be derived from different biological characteristics; interactions and binding events as well as expression behaviour. To investigate the hypothesis that common function can be derived from expression information, public domain human microarray datasets are assembled. The questions of how best to integrate these datasets and derive features that are useful in function prediction are addressed. Both co-expression and abundance information is represented between and within experiments and investigated for correlation with function. It is found that features derived from expression data serve as a weak but significant signal for recognising functions. This signal is stronger for biological processes than molecular function categories and independent of homology information. The protein domain has historically been coined as a modular evolutionary unit of protein function. The occurrence of domains that can be linked by ancestral fusion events serves as a signal for domain-domain interactions. To exploit this information for function prediction, novel domain architecture and fused architecture scores are developed. Architecture scores rather than single domain scores correlate more strongly with function, and both architecture and fusion scores correlate more strongly with molecular functions than biological processes. The final study details the development of a novel heterogeneous function prediction approach designed to target the annotation of both homologous and non-homologous proteins. Support vector regression is used to combine pair-wise sequence features with expression scores and domain architecture scores to rank protein pairs in terms of their functional similarities. The target of the regression models represents the continuum of protein function space empirically derived from the Gene Ontology molecular function and biological process graphs. The merit and performance of the approach is demonstrated using homologous and non-homologous test datasets and significantly improves upon classical nearest neighbour annotation transfer by sequence methods. The final model represents a method that achieves a compromise between high specificity and sensitivity for all human proteins regardless of their homology status. It is expected that this strategy will allow for more comprehensive and accurate annotations of the human proteome

    White matter differences between healthy young ApoE4 carriers and non-carriers identified with tractography and support vector machines.

    Get PDF
    The apolipoprotein E4 (ApoE4) is an established risk factor for Alzheimer's disease (AD). Previous work has shown that this allele is associated with functional (fMRI) changes as well structural grey matter (GM) changes in healthy young, middle-aged and older subjects. Here, we assess the diffusion characteristics and the white matter (WM) tracts of healthy young (20-38 years) ApoE4 carriers and non-carriers. No significant differences in diffusion indices were found between young carriers (ApoE4+) and non-carriers (ApoE4-). There were also no significant differences between the groups in terms of normalised GM or WM volume. A feature selection algorithm (ReliefF) was used to select the most salient voxels from the diffusion data for subsequent classification with support vector machines (SVMs). SVMs were capable of classifying ApoE4 carrier and non-carrier groups with an extremely high level of accuracy. The top 500 voxels selected by ReliefF were then used as seeds for tractography which identified a WM network that included regions of the parietal lobe, the cingulum bundle and the dorsolateral frontal lobe. There was a non-significant decrease in volume of this WM network in the ApoE4 carrier group. Our results indicate that there are subtle WM differences between healthy young ApoE4 carriers and non-carriers and that the WM network identified may be particularly vulnerable to further degeneration in ApoE4 carriers as they enter middle and old age

    Multi-Scale Fluctuations in Non-Equilibrium Systems: Statistical Physics and Biological Application

    Get PDF
    Understanding how fluctuations continuously propagate across spatial scales is fundamental for our understanding of inanimate matter. This is exemplified by self-similar fluctuations in critical phenomena and the propagation of energy fluctuations described by the Kolmogorov-Law in turbulence. Our understanding is based on powerful theoretical frameworks that integrate fluctuations on intermediary scales, as in renormalisation group or coupled mode theory. In striking contrast to typical inanimate systems, living matter is typically organised into a hierarchy of processes on a discrete set of spatial scales: from biochemical processes embedded in dynamic subcellular compartments to cells giving rise to tissues. Therefore, the understanding of living matter requires novel theories that predict the interplay of fluctuations on multiple scales of biological organisation and the ensuing emergent degrees of freedom. In this thesis, we derive a general theory of the multi-scale propagation of fluctuations in non-equilibrium systems and show that such processes underlie the regulation of cellular behaviour. Specifically, we draw on paradigmatic systems comprising stochastic many-particle systems undergoing dynamic compartmentalisation. We first derive a theory for emergent degrees of freedom in open systems, where the total mass is not conserved. We show that the compartment dynamics give rise to the localisation of probability densities in phase space resembling quasi-particle behaviour. This emergent quasi-particle exhibits fundamentally different response kinetics and steady states compared to systems lacking compartment dynamics. In order to investigate a potential biological function of such quasi-particle dynamics, we then apply this theory to the regulation of cell death. We derive a model describing the subcellular processes that regulate cell death and show that the quasi-particle dynamics gives rise to a kinetic low-pass filter which suppresses the response of the cell to fast fluituations in cellular stress signals. We test our predictions experimentally by quantifying cell death in cell cultures subject to stress stimuli varying in strength and duration. In closed systems, where the total mass is conserved, the effect of dynamic compartmentalisation depends on details of the kinetics on the scale of the stochastic many-particle dynamics. Using a second quantisation approach, we derive a commutator relation between the kinetic operators and the change in total entropy. Drawing on this, we show that the compartment dynamics alters the total entropy if the kinetics of the stochastic many-particle dynamics violate detailed balance. We apply this mechanism to the activation of cellular immune responses to RNA-virus infections. We show that dynamic compartmentalisation in closed systems gives rise to giant density fluctuations. This facilitates the emergence of gelation under conditions that violate theoretical gelation criteria in the absence of compartment dynamics. We show that such multi-scale gelation of protein complexes on the membranes of dynamic mitochondria governs the innate immune response. Taken together, we provide a general theory describing the multi-scale propagation of fluctuations in biological systems. Our work pioneers the development of a statistical physics of such systems and highlights emergent degrees of freedom spanning different scales of biological organisation. By demonstrating that cells manipulate how fluctuations propagate across these scales, our work motivates a rethinking of how the behaviour of cells is regulated

    The impact of temporal sampling resolution on parameter inference for biological transport models

    Full text link
    Imaging data has become widely available to study biological systems at various scales, for example the motile behaviour of bacteria or the transport of mRNA, and it has the potential to transform our understanding of key transport mechanisms. Often these imaging studies require us to compare biological species or mutants, and to do this we need to quantitatively characterise their behaviour. Mathematical models offer a quantitative description of a system that enables us to perform this comparison, but to relate these mechanistic mathematical models to imaging data, we need to estimate the parameters of the models. In this work, we study the impact of collecting data at different temporal resolutions on parameter inference for biological transport models by performing exact inference for simple velocity jump process models in a Bayesian framework. This issue is prominent in a host of studies because the majority of imaging technologies place constraints on the frequency with which images can be collected, and the discrete nature of observations can introduce errors into parameter estimates. In this work, we avoid such errors by formulating the velocity jump process model within a hidden states framework. This allows us to obtain estimates of the reorientation rate and noise amplitude for noisy observations of a simple velocity jump process. We demonstrate the sensitivity of these estimates to temporal variations in the sampling resolution and extent of measurement noise. We use our methodology to provide experimental guidelines for researchers aiming to characterise motile behaviour that can be described by a velocity jump process. In particular, we consider how experimental constraints resulting in a trade-off between temporal sampling resolution and observation noise may affect parameter estimates.Comment: Published in PLOS Computational Biolog

    Graph Theoretic and Pearson Correlation-Based Discovery of Network Biomarkers for Cancer

    Get PDF
    Two graph theoretic concepts—clique and bipartite graphs—are explored to identify the network biomarkers for cancer at the gene network level. The rationale is that a group of genes work together by forming a cluster or a clique-like structures to initiate a cancer. After initiation, the disease signal goes to the next group of genes related to the second stage of a cancer, which can be represented as a bipartite graph. In other words, bipartite graphs represent the cross-talk among the genes between two disease stages. To prove this hypothesis, gene expression values for three cancers— breast invasive carcinoma (BRCA), colorectal adenocarcinoma (COAD) and glioblastoma multiforme (GBM)—are used for analysis. First, a co-expression gene network is generated with highly correlated gene pairs with a Pearson correlation coefficient ≥ 0.9. Second, clique structures of all sizes are isolated from the co-expression network. Then combining these cliques, three different biomarker modules are developed—maximal clique-like modules, 2-clique-1-bipartite modules, and 3-clique-2-bipartite modules. The list of biomarker genes discovered from these network modules are validated as the essential genes for causing a cancer in terms of network properties and survival analysis. This list of biomarker genes will help biologists to design wet lab experiments for further elucidating the complex mechanism of cancer

    Integrative Data Analytic Framework to Enhance Cancer Precision Medicine

    Get PDF
    With the advancement of high-throughput biotechnologies, we increasingly accumulate biomedical data about diseases, especially cancer. There is a need for computational models and methods to sift through, integrate, and extract new knowledge from the diverse available data to improve the mechanistic understanding of diseases and patient care. To uncover molecular mechanisms and drug indications for specific cancer types, we develop an integrative framework able to harness a wide range of diverse molecular and pan-cancer data. We show that our approach outperforms competing methods and can identify new associations. Furthermore, through the joint integration of data sources, our framework can also uncover links between cancer types and molecular entities for which no prior knowledge is available. Our new framework is flexible and can be easily reformulated to study any biomedical problems.Comment: 18 page
    corecore