108 research outputs found

    Parameter inference for discretely observed stochastic kinetic models using stochastic gradient descent

    Get PDF
    Abstract Background Stochastic effects can be important for the behavior of processes involving small population numbers, so the study of stochastic models has become an important topic in the burgeoning field of computational systems biology. However analysis techniques for stochastic models have tended to lag behind their deterministic cousins due to the heavier computational demands of the statistical approaches for fitting the models to experimental data. There is a continuing need for more effective and efficient algorithms. In this article we focus on the parameter inference problem for stochastic kinetic models of biochemical reactions given discrete time-course observations of either some or all of the molecular species. Results We propose an algorithm for inference of kinetic rate parameters based upon maximum likelihood using stochastic gradient descent (SGD). We derive a general formula for the gradient of the likelihood function given discrete time-course observations. The formula applies to any explicit functional form of the kinetic rate laws such as mass-action, Michaelis-Menten, etc. Our algorithm estimates the gradient of the likelihood function by reversible jump Markov chain Monte Carlo sampling (RJMCMC), and then gradient descent method is employed to obtain the maximum likelihood estimation of parameter values. Furthermore, we utilize flux balance analysis and show how to automatically construct reversible jump samplers for arbitrary biochemical reaction models. We provide RJMCMC sampling algorithms for both fully observed and partially observed time-course observation data. Our methods are illustrated with two examples: a birth-death model and an auto-regulatory gene network. We find good agreement of the inferred parameters with the actual parameters in both models. Conclusions The SGD method proposed in the paper presents a general framework of inferring parameters for stochastic kinetic models. The method is computationally efficient and is effective for both partially and fully observed systems. Automatic construction of reversible jump samplers and general formulation of the likelihood gradient function makes our method applicable to a wide range of stochastic models. Furthermore our derivations can be useful for other purposes such as using the gradient information for parametric sensitivity analysis or using the reversible jump samplers for full Bayesian inference. The software implementing the algorithms is publicly available at http://cbcl.ics.uci.edu/sg

    Integrative multicellular biological modeling: a case study of 3D epidermal development using GPU algorithms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Simulation of sophisticated biological models requires considerable computational power. These models typically integrate together numerous biological phenomena such as spatially-explicit heterogeneous cells, cell-cell interactions, cell-environment interactions and intracellular gene networks. The recent advent of programming for graphical processing units (GPU) opens up the possibility of developing more integrative, detailed and predictive biological models while at the same time decreasing the computational cost to simulate those models.</p> <p>Results</p> <p>We construct a 3D model of epidermal development and provide a set of GPU algorithms that executes significantly faster than sequential central processing unit (CPU) code. We provide a parallel implementation of the subcellular element method for individual cells residing in a lattice-free spatial environment. Each cell in our epidermal model includes an internal gene network, which integrates cellular interaction of Notch signaling together with environmental interaction of basement membrane adhesion, to specify cellular state and behaviors such as growth and division. We take a pedagogical approach to describing how modeling methods are efficiently implemented on the GPU including memory layout of data structures and functional decomposition. We discuss various programmatic issues and provide a set of design guidelines for GPU programming that are instructive to avoid common pitfalls as well as to extract performance from the GPU architecture.</p> <p>Conclusions</p> <p>We demonstrate that GPU algorithms represent a significant technological advance for the simulation of complex biological models. We further demonstrate with our epidermal model that the integration of multiple complex modeling methods for heterogeneous multicellular biological processes is both feasible and computationally tractable using this new technology. We hope that the provided algorithms and source code will be a starting point for modelers to develop their own GPU implementations, and encourage others to implement their modeling methods on the GPU and to make that code available to the wider community.</p

    Data structures and compression algorithms for high-throughput sequencing technologies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput sequencing (HTS) technologies play important roles in the life sciences by allowing the rapid parallel sequencing of very large numbers of relatively short nucleotide sequences, in applications ranging from genome sequencing and resequencing to digital microarrays and ChIP-Seq experiments. As experiments scale up, HTS technologies create new bioinformatics challenges for the storage and sharing of HTS data.</p> <p>Results</p> <p>We develop data structures and compression algorithms for HTS data. A processing stage maps short sequences to a reference genome or a large table of sequences. Then the integers representing the short sequence absolute or relative addresses, their length, and the substitutions they may contain are compressed and stored using various entropy coding algorithms, including both old and new fixed codes (e.g Golomb, Elias Gamma, MOV) and variable codes (e.g. Huffman). The general methodology is illustrated and applied to several HTS data sets. Results show that the information contained in HTS files can be compressed by a factor of 10 or more, depending on the statistical properties of the data sets and various other choices and constraints. Our algorithms fair well against general purpose compression programs such as gzip, bzip2 and 7zip; timing results show that our algorithms are consistently faster than the best general purpose compression programs.</p> <p>Conclusions</p> <p>It is not likely that exactly one encoding strategy will be optimal for all types of HTS data. Different experimental conditions are going to generate various data distributions whereby one encoding strategy can be more effective than another. We have implemented some of our encoding algorithms into the software package GenCompress which is available upon request from the authors. With the advent of HTS technology and increasingly new experimental protocols for using the technology, sequence databases are expected to continue rising in size. The methodology we have proposed is general, and these advanced compression techniques should allow researchers to manage and share their HTS data in a more timely fashion.</p

    Patterns of Mesenchymal Condensation in a Multiscale, Discrete Stochastic Model

    Get PDF
    Cells of the embryonic vertebrate limb in high-density culture undergo chondrogenic pattern formation, which results in the production of regularly spaced “islands” of cartilage similar to the cartilage primordia of the developing limb skeleton. The first step in this process, in vitro and in vivo, is the generation of “cell condensations,” in which the precartilage cells become more tightly packed at the sites at which cartilage will form. In this paper we describe a discrete, stochastic model for the behavior of limb bud precartilage mesenchymal cells in vitro. The model uses a biologically motivated reaction–diffusion process and cell-matrix adhesion (haptotaxis) as the bases of chondrogenic pattern formation, whereby the biochemically distinct condensing cells, as well as the size, number, and arrangement of the multicellular condensations, are generated in a self-organizing fashion. Improving on an earlier lattice-gas representation of the same process, it is multiscale (i.e., cell and molecular dynamics occur on distinct scales), and the cells are represented as spatially extended objects that can change their shape. The authors calibrate the model using experimental data and study sensitivity to changes in key parameters. The simulations have disclosed two distinct dynamic regimes for pattern self-organization involving transient or stationary inductive patterns of morphogens. The authors discuss these modes of pattern formation in relation to available experimental evidence for the in vitro system, as well as their implications for understanding limb skeletal patterning during embryonic development

    Comparative genetics of Enterococcus faecalis intestinal tissue isolates before and after surgery in a rat model of colon anastomosis.

    Get PDF
    We have recently demonstrated that collagenolytic Enterococcus faecalis plays a key and causative role in the pathogenesis of anastomotic leak, an uncommon but potentially lethal complication characterized by disruption of the intestinal wound following segmental removal of the colon (resection) and its reconnection (anastomosis). Here we hypothesized that comparative genetic analysis of E. faecalis isolates present at the anastomotic wound site before and after surgery would shed insight into the mechanisms by which collagenolytic strains are selected for and predominate at sites of anastomotic disruption. Whole genome optical mapping of four pairs of isolates from rat colonic tissue obtained following surgical resection (herein named "pre-op" isolates) and then 6 days later from the anastomotic site (herein named "post-op" isolates) demonstrated that the isolates with higher collagenolytic activity formed a distinct cluster. In order to perform analysis at a deeper level, a single pair of E. faecalis isolates (16A pre-op and 16A post-op) was selected for whole genome sequencing and assembled using a hybrid assembly algorithm. Comparative genomics demonstrated absence of multiple gene clusters, notably a pathogenicity island in the post-op isolate. No differences were found in the fsr-gelE-sprE genes (EF1817-1822) responsible for regulation and production of collagenolytic activity. Analysis of unique genes among the 16A pre-op and post-op isolates revealed the predominance of transporter systems-related genes in the pre-op isolate and phage-related and hydrolytic enzyme-encoding genes in the post-op isolate. Despite genetic differences observed between pre-op and post-op isolates, the precise genetic determinants responsible for their differential expression of collagenolytic activity remains unknown

    ReCoil - an algorithm for compression of extremely large datasets of dna data

    Get PDF
    The growing volume of generated DNA sequencing data makes the problem of its long term storage increasingly important. In this work we present ReCoil - an I/O efficient external memory algorithm designed for compression of very large collections of short reads DNA data. Typically each position of DNA sequence is covered by multiple reads of a short read dataset and our algorithm makes use of resulting redundancy to achieve high compression rate

    Multiple organism algorithm for finding ultraconserved elements

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Ultraconserved elements are nucleotide or protein sequences with 100% identity (no mismatches, insertions, or deletions) in the same organism or between two or more organisms. Studies indicate that these conserved regions are associated with micro RNAs, mRNA processing, development and transcription regulation. The identification and characterization of these elements among genomes is necessary for the further understanding of their functionality.</p> <p>Results</p> <p>We describe an algorithm and provide freely available software which can find all of the ultraconserved sequences between genomes of multiple organisms. Our algorithm takes a combinatorial approach that finds all sequences without requiring the genomes to be aligned. The algorithm is significantly faster than BLAST and is designed to handle very large genomes efficiently. We ran our algorithm on several large comparative analyses to evaluate its effectiveness; one compared 17 vertebrate genomes where we find 123 ultraconserved elements longer than 40 bps shared by all of the organisms, and another compared the human body louse, <it>Pediculus humanus humanus</it>, against itself and select insects to find thousands of non-coding, potentially functional sequences.</p> <p>Conclusion</p> <p>Whole genome comparative analysis for multiple organisms is both feasible and desirable in our search for biological knowledge. We argue that bioinformatic programs should be forward thinking by assuming analysis on multiple (and possibly large) genomes in the design and implementation of algorithms. Our algorithm shows how a compromise design with a trade-off of disk space versus memory space allows for efficient computation while only requiring modest computer resources, and at the same time providing benefits not available with other software.</p

    Fecal microbiota transplant rescues mice from sepsis due to multi-drug resistant healthcare pathogens by restoring systemic immunity

    Get PDF
    Death due to sepsis remains a persistent threat to critically ill patients confined to the intensive care unit and is characterized by colonization with multi-drug-resistant healthcare-associated pathogens. Here we report that sepsis in mice caused by a defined four-member pathogen community isolated from a patient with lethal sepsis is associated with the systemic suppression of key elements of the host transcriptome required for pathogen clearance and decreased butyrate expression. More specifically, these pathogens directly suppress interferon regulatory factor 3. Fecal microbiota transplant (FMT) reverses the course of otherwise lethal sepsis by enhancing pathogen clearance via the restoration of host immunity in an interferon regulatory factor 3-dependent manner. This protective effect is linked to the expansion of butyrate-producing Bacteroidetes. Taken together these results suggest that fecal microbiota transplantation may be a treatment option in sepsis associated with immunosuppression
    corecore