9 research outputs found

    Techniques for automated parameter estimation in computational models of probabilistic systems

    Get PDF
    The main contribution of this dissertation is the design of two new algorithms for automatically synthesizing values of numerical parameters of computational models of complex stochastic systems such that the resultant model meets user-specified behavioral specifications. These algorithms are designed to operate on probabilistic systems – systems that, in general, behave differently under identical conditions. The algorithms work using an approach that combines formal verification and mathematical optimization to explore a model\u27s parameter space. The problem of determining whether a model instantiated with a given set of parameter values satisfies the desired specification is first defined using formal verification terminology, and then reformulated in terms of statistical hypothesis testing. Parameter space exploration involves determining the outcome of the hypothesis testing query for each parameter point and is guided using simulated annealing. The first algorithm uses the sequential probability ratio test (SPRT) to solve the hypothesis testing problems, whereas the second algorithm uses an approach based on Bayesian statistical model checking (BSMC). The SPRT-based parameter synthesis algorithm was used to validate that a given model of glucose-insulin metabolism has the capability of representing diabetic behavior by synthesizing values of three parameters that ensure that the glucose-insulin subsystem spends at least 20 minutes in a diabetic scenario. The BSMC-based algorithm was used to discover the values of parameters in a physiological model of the acute inflammatory response that guarantee a set of desired clinical outcomes. These two applications demonstrate how our algorithms use formal verification, statistical hypothesis testing and mathematical optimization to automatically synthesize parameters of complex probabilistic models in order to meet user-specified behavioral propertie

    Rigidity analysis of protein biological assemblies and periodic crystal structures

    Get PDF
    Background We initiate in silico rigidity-theoretical studies of biological assemblies and small crystals for protein structures. The goal is to determine if, and how, the interactions among neighboring cells and subchains affect the flexibility of a molecule in its crystallized state. We use experimental X-ray crystallography data from the Protein Data Bank (PDB). The analysis relies on an effcient graph-based algorithm. Computational experiments were performed using new protein rigidity analysis tools available in the new release of our KINARI-Web server http://kinari.cs.umass.edu. Results We provide two types of results: on biological assemblies and on crystals. We found that when only isolated subchains are considered, structural and functional information may be missed. Indeed, the rigidity of biological assemblies is sometimes dependent on the count and placement of hydrogen bonds and other interactions among the individual subchains of the biological unit. Similarly, the rigidity of small crystals may be affected by the interactions between atoms belonging to different unit cells. We have analyzed a dataset of approximately 300 proteins, from which we generated 982 crystals (some of which are biological assemblies). We identified two types of behaviors. (a) Some crystals and/or biological assemblies will aggregate into rigid bodies that span multiple unit cells/asymmetric units. Some of them create substantially larger rigid cluster in the crystal/biological assembly form, while in other cases, the aggregation has a smaller effect just at the interface between the units. (b) In other cases, the rigidity properties of the asymmetric units are retained, because the rigid bodies did not combine. We also identified two interesting cases where rigidity analysis may be correlated with the functional behavior of the protein. This type of information, identified here for the first time, depends critically on the ability to create crystals and biological assemblies, and would not have been observed only from the asymmetric unit. For the Ribonuclease A protein (PDB file 5RSA), which is functionally active in the crystallized form, we found that the individual protein and its crystal form retain the flexibility parameters between the two states. In contrast, a derivative of Ribonuclease A (PDB file 9RSA), has no functional activity, and the protein in both the asymmetric and crystalline forms, is very rigid. For the vaccinia virus D13 scaffolding protein (PDB file 3SAQ), which has two biological assemblies, we observed a striking asymmetry in the rigidity cluster decomposition of one of them, which seems implausible, given its symmetry. Upon careful investigation, we tracked the cause to a placement decision by the Reduce software concerning the hydrogen atoms, thus affecting the distribution of certain hydrogen bonds. The surprising result is that the presence or lack of a very few, but critical, hydrogen bonds, can drastically affect the rigid cluster decomposition of the biological assembly. Conclusion The rigidity analysis of a single asymmetric unit may not accurately reflect the protein\u27s behavior in the tightly packed crystal environment. Using our KINARI software, we demonstrated that additional functional and rigidity information can be gained by analyzing a protein\u27s biological assembly and/or crystal structure. However, performing a larger scale study would be computationally expensive (due to the size of the molecules involved). Overcoming this limitation will require novel mathematical and computational extensions to our software

    Automatic B cell lymphoma detection using flow cytometry data

    Get PDF
    Background: Flow cytometry has been widely used for the diagnosis of various hematopoietic diseases. Although there have been advances in the number of biomarkers that can be analyzed simultaneously and technologies that enable fast performance, the diagnostic data are still interpreted by a manual gating strategy. The process is labor-intensive, time-consuming, and subject to human error. Results: We used 80 sets of flow cytometry data from 44 healthy donors, 21 patients with chronic lymphocytic leukemia (CLL), and 15 patients with follicular lymphoma (FL). Approximately 15% of data from each group were used to build the profiles. Our approach was able to successfully identify 36/37 healthy donor cases, 18/18 CLL cases, and 12/13 FL cases. Conclusions: This proof-of-concept study demonstrated that an automated diagnosis of CLL and FL can be obtained by examining the cell capture rates of a test case using the computational method based on the multi-profile detection algorithm. The testing phase of our system is efficient and can facilitate diagnosis of B-lymphocyte neoplasms

    Algorithms for Transcriptome Quantification and Reconstruction from RNA-Seq Data

    Get PDF
    Massively parallel whole transcriptome sequencing and its ability to generate full transcriptome data at the single transcript level provides a powerful tool with multiple interrelated applications, including transcriptome reconstruction, gene/isoform expression estimation, also known as transcriptome quantification. As a result, whole transcriptome sequencing has become the technology of choice for performing transcriptome analysis, rapidly replacing array-based technologies. The most commonly used transcriptome sequencing protocol, referred to as RNA-Seq, generates short (single or paired) sequencing tags from the ends of randomly generated cDNA fragments. RNA-Seq protocol reduces the sequencing cost and significantly increases data throughput, but is computationally challenging to reconstruct full-length transcripts and accurately estimate their abundances across all cell types. We focus on two main problems in transcriptome data analysis, namely, transcriptome reconstruction and quantification. Transcriptome reconstruction, also referred to as novel isoform discovery, is the problem of reconstructing the transcript sequences from the sequencing data. Reconstruction can be done de novo or it can be assisted by existing genome and transcriptome annotations. Transcriptome quantification refers to the problem of estimating the expression level of each transcript. We present a genome-guided and annotation-guided transcriptome reconstruction methods as well as methods for transcript and gene expression level estimation. Empirical results on both synthetic and real RNA-seq datasets show that the proposed methods improve transcriptome quantification and reconstruction accuracy compared to previous methods

    Towards Accurate Modeling of Noncovalent Interactions for Protein Rigidity Analysis

    Get PDF
    Background: Protein rigidity analysis is an efficient computational method for extracting flexibility information from static, X-ray crystallography protein data. Atoms and bonds are modeled as a mechanical structure and analyzed with a fast graph-based algorithm, producing a decomposition of the flexible molecule into interconnected rigid clusters. The result depends critically on noncovalent atomic interactions, primarily on how hydrogen bonds and hydrophobic interactions are computed and modeled. Ongoing research points to the stringent need for benchmarking rigidity analysis software systems, towards the goal of increasing their accuracy and validating their results, either against each other and against biologically relevant (functional) parameters. We propose two new methods for modeling hydrogen bonds and hydrophobic interactions that more accurately reflect a mechanical model, without being computationally more intensive. We evaluate them using a novel scoring method, based on the B-cubed score from the information retrieval literature, which measures how well two cluster decompositions match. Results: To evaluate the modeling accuracy of KINARI, our pebble-game rigidity analysis system, we use a benchmark data set of 20 proteins, each with multiple distinct conformations deposited in the Protein Data Bank. Cluster decompositions for them were previously determined with the RigidFinder method from Gerstein\u27s lab and validated against experimental data. When KINARI\u27s default tuning parameters are used, an improvement of the Bcubed score over a crude baseline is observed in 30% of this data. With our new modeling options, improvements were observed in over 70% of the proteins in this data set. We investigate the sensitivity of the cluster decomposition score with case studies on pyruvate phosphate dikinase and calmodulin. Conclusion: To substantially improve the accuracy of protein rigidity analysis systems, thorough benchmarking must be performed on all current systems and future extensions. We have measured the gain in performance by comparing different modeling methods for noncovalent interactions. We showed that new criteria for modeling hydrogen bonds and hydrophobic interactions can significantly improve the results. The two new methods proposed here have been implemented and made publicly available in the current version of KINARI (v1.3), together with the benchmarking tools, which can be downloaded from our software\u27s website, http://kinari.cs.umass.edu

    Temporal and Causal Inference with Longitudinal Multi-omics Microbiome Data

    Get PDF
    Microbiomes are communities of microbes inhabiting an environmental niche. Thanks to next generation sequencing technologies, it is now possible to study microbial communities, their impact on the host environment, and their role in specific diseases and health. Technology has also triggered the increased generation of multi-omics microbiome data, including metatranscriptomics (quantitative survey of the complete metatranscriptome of the microbial community), metabolomics (quantitative profile of the entire set of metabolites present in the microbiome\u27s environmental niche), and host transcriptomics (gene expression profile of the host). Consequently, another major challenge in microbiome data analysis is the integration of multi-omics data sets and the construction of unified models. Finally, since microbiomes are inherently dynamic, to fully understand the complex interactions that take place within these communities, longitudinal studies are critical. Although the analysis of longitudinal microbiome data has been attempted, these approaches do not attempt to probe interactions between taxa, do not offer holistic analyses, and do not investigate causal relationships. In this work we propose approaches to address all of the above challenges. We propose novel analysis pipelines to analyze multi-omic longitudinal microbiome data, and to infer temporal and causal relationships between the different entities involved. As a first step, we showed how to deal with longitudinal metagenomic data sets by building a pipeline, PRIMAL, which takes microbial abundance data as input and outputs a dynamic Bayesian network model that is highly predictive, suggests significant interactions between the different microbes, and proposes important connections from clinical variables. A significant innovation of our work is its ability to deal with differential rates of the internal biological processes in different individuals. Second, we showed how to analyze longitudinal multi-omic microbiome datasets. Our pipeline, PALM, significantly extends the previous state of the art by allowing for the integration of longitudinal metatranscriptomics, host transcriptomics, and metabolomics data in additional to longitudinal metagenomics data. PALM achieves prediction powers comparable to the PRIMAL pipeline while discovering a web of interactions between the entities of far greater complexity. An important innovation of PALM is the use of a multi-omic Skeleton framework that incorporates prior knowledge in the learning of the models. Another major innovation of this work is devising a suite of validation methods, both in silico and in vitro, enhancing the utility and validity of PALM. Finally, we propose a suite of novel methods (unrolling and de-confounding), called METALICA, consisting of tools and techniques that make it possible to uncover significant details about the nature of microbial interactions. We also show methods to validate such interactions using ground truth databases. The proposed methods were tested using an IBD multi-omics dataset
    corecore