Search CORE

1,456 research outputs found

Exact Learning of RNA Energy Parameters From Structure

Author: Aminisharifabad Mohammad
Chitsaz Hamidreza
Publication venue
Publication date: 15/10/2013
Field of study

We consider the problem of exact learning of parameters of a linear RNA energy model from secondary structure data. A necessary and sufficient condition for learnability of parameters is derived, which is based on computing the convex hull of union of translated Newton polytopes of input sequences. The set of learned energy parameters is characterized as the convex cone generated by the normal vectors to those facets of the resulting polytope that are incident to the origin. In practice, the sufficient condition may not be satisfied by the entire training data set; hence, computing a maximal subset of training data for which the sufficient condition is satisfied is often desired. We show that problem is NP-hard in general for an arbitrary dimensional feature space. Using a randomized greedy algorithm, we select a subset of RNA STRAND v2.0 database that satisfies the sufficient condition for separate A-U, C-G, G-U base pair counting model. The set of learned energy parameters includes experimentally measured energies of A-U, C-G, and G-U pairs; hence, our parameter set is in agreement with the Turner parameters

arXiv.org e-Print Archive

CiteSeerX

The RNA Newton Polytope and Learnability of Energy Parameters

Author: Chitsaz Hamidreza
Forouzmand Elmirasadat
Publication venue
Publication date: 08/01/2013
Field of study

Despite nearly two scores of research on RNA secondary structure and RNA-RNA interaction prediction, the accuracy of the state-of-the-art algorithms are still far from satisfactory. Researchers have proposed increasingly complex energy models and improved parameter estimation methods in anticipation of endowing their methods with enough power to solve the problem. The output has disappointingly been only modest improvements, not matching the expectations. Even recent massively featured machine learning approaches were not able to break the barrier. In this paper, we introduce the notion of learnability of the parameters of an energy model as a measure of its inherent capability. We say that the parameters of an energy model are learnable iff there exists at least one set of such parameters that renders every known RNA structure to date the minimum free energy structure. We derive a necessary condition for the learnability and give a dynamic programming algorithm to assess it. Our algorithm computes the convex hull of the feature vectors of all feasible structures in the ensemble of a given input sequence. Interestingly, that convex hull coincides with the Newton polytope of the partition function as a polynomial in energy parameters. We demonstrated the application of our theory to a simple energy model consisting of a weighted count of A-U and C-G base pairs. Our results show that this simple energy model satisfies the necessary condition for less than one third of the input unpseudoknotted sequence-structure pairs chosen from the RNA STRAND v2.0 database. For another one third, the necessary condition is barely violated, which suggests that augmenting this simple energy model with more features such as the Turner loops may solve the problem. The necessary condition is severely violated for 8%, which provides a small set of hard cases that require further investigation

arXiv.org e-Print Archive

CiteSeerX

DeepCare: A Deep Dynamic Memory Model for Predictive Medicine

Author: A Graves
AB Jensen
BB Granger
J Futoma
JM Corbin
JS Mathias
K Orphanou
PB Jensen
R Henriques
S Hochreiter
SJ Henly
T Tran
T Tran
Y LeCun
Publication venue
Publication date: 01/01/2016
Field of study

Personalized predictive medicine necessitates the modeling of patient illness and care processes, which inherently have long-term temporal dependencies. Healthcare observations, recorded in electronic medical records, are episodic and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural network that reads medical records, stores previous illness history, infers current illness states and predicts future medical outcomes. At the data level, DeepCare represents care episodes as vectors in space, models patient health state trajectories through explicit memory of historical records. Built on Long Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle irregular timed events by moderating the forgetting and consolidation of memory cells. DeepCare also incorporates medical interventions that change the course of illness and shape future medical risk. Moving up to the health state level, historical and present health states are then aggregated through multiscale temporal pooling, before passing through a neural network that estimates future outcomes. We demonstrate the efficacy of DeepCare for disease progression modeling, intervention recommendation, and future risk prediction. On two important cohorts with heavy social and economic burden -- diabetes and mental health -- the results show improved modeling and risk prediction accuracy.Comment: Accepted at JBI under the new name: "Predicting healthcare trajectories from medical records: A deep learning approach

arXiv.org e-Print Archive

Deakin Research Online

Crossref

Recommended from our members

Modeling RNA, protein, and synthetic molecules using coarse-grained and all-atom representations

Author: Bell David Russell
Publication venue
Publication date: 31/01/2019
Field of study

The aim of computational chemistry is to depict and understand the dynamics and interactions of molecular systems. In addition to increased comprehension in the physical and life sciences, this insight yields important applications to therapeutic design and materials science. In computational chemistry, molecules can be modeled in a number of representations depending on the molecular system and phenomena of interest. In this work, both simplified, coarse-grained representations and all-atom representations are used to model the interactions of RNA, cucurbituril host-guest chemistry, and cadmium selenide quantum dot binding to the Src homology 3 domain. For RNA, a coarse-grained model was developed termed RACER (RnA CoarsE-gRained) to accurately predict RNA structure and folding free energy. After optimization to statistical potentials, RACER accurately predicted the structures of 14 RNAs with an average 4.15Å root mean square deviation (RMSD) to the experimental structure. Further, RACER captured the sequence-specific variation in folding free energy for a set of 6 RNA hairpins and 5 RNA duplexes, with a R² correlation of 0.96 to experiment. The binding free energies of a cucurbituril host with 14 guests were computed using a polarizable force field and the free energy techniques of Bennett acceptance ratio and the orthogonal space random walk. The polarizable force field captured binding accurately, yet unexpectedly, the orthogonal space random walk method converged slowly, albeit at still reduced computational expense to the Bennett acceptance ratio. Lastly, the nanotoxicity effects of trioctylphosphine oxide coated cadmium selenide quantum dots are investigated with the model Src homology 3 protein domain in complex with its native proline rich motif ligand. With increasing quantum dot concentration, there is an increasing preference for the quantum dots to bind to the proline rich motif active site, inhibiting Src homology 3 function.Biomedical Engineerin

Texas ScholarWorks

RFold: RNA Secondary Structure Prediction with Decoupled Optimization

Author: Gao Zhangyang
Li Stan Z.
Tan Cheng
Publication venue
Publication date: 27/04/2023
Field of study

The secondary structure of ribonucleic acid (RNA) is more stable and accessible in the cell than its tertiary structure, making it essential for functional prediction. Although deep learning has shown promising results in this field, current methods suffer from poor generalization and high complexity. In this work, we present RFold, a simple yet effective RNA secondary structure prediction in an end-to-end manner. RFold introduces a decoupled optimization process that decomposes the vanilla constraint satisfaction problem into row-wise and column-wise optimization, simplifying the solving process while guaranteeing the validity of the output. Moreover, RFold adopts attention maps as informative representations instead of designing hand-crafted features. Extensive experiments demonstrate that RFold achieves competitive performance and about eight times faster inference efficiency than the state-of-the-art method. The code and Colab demo are available in \href{http://github.com/A4Bio/RFold}{http://github.com/A4Bio/RFold}

arXiv.org e-Print Archive

The Rna Newton Polytope And Learnability Of Energy Parameters

Author: Forouzmand Elmirasadat
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2014
Field of study

Computational RNA secondary structure prediction has been a topic of much research interest for several decades now. Despite all the progress made in the field, even the state-of-the-art algorithms do not provide satisfying results, and the accuracy of output is limited for all the existent tools. Very complex energy models, different parameter estimation methods, and recent machine learning approaches had not been the answer for this problem. We believe that the first step to achieve results with high quality is to use the energy model with the potential for predicting accurate output. Hence, it is necessary to have a systematic way to analyze the suitability of an energy model. We introduced the notion of learnability to measure this suitability. A learnable energy model has at least one subset of parameters that can render every known RNA to date the minimum free energy structure, which means 100% accuracy. We also found the necessary condition for a model to be learnable and implemented the dynamic programming based algorithm to asses this condition for a set of RNAs. This algorithm computes the convex hull of all possible feature vectors for a sequence. With the partition function as a polynomial, this convex hull is also the Newton polytope of the partition function. To the best of our knowledge, this is the first systematic approach for evaluating the inherent capability of an energy model

Digital Commons@Wayne State University

Integrating NMR and simulations reveals motions in the UUCG tetraloop

Author: Bottaro Sandro
Lindorff-Larsen Kresten
Nichols Parker J.
Parrinello Michele
Vögeli Beat
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2020
Field of study

Copenhagen University Research Information System

Espaloma-0.3.0: Machine-learned molecular mechanics force field for the simulation of protein-ligand systems and beyond

Author: Chodera John D.
Henry Mike
MacDermott-Opeskin Hugo
Pulido Iván
Takaba Kenichiro
Wang Yuanqing
Publication venue
Publication date: 13/07/2023
Field of study

Molecular mechanics (MM) force fields -- the models that characterize the energy landscape of molecular systems via simple pairwise and polynomial terms -- have traditionally relied on human expert-curated, inflexible, and poorly extensible discrete chemical parameter assignment rules, namely atom or valence types. Recently, there has been significant interest in using graph neural networks to replace this process, while enabling the parametrization scheme to be learned in an end-to-end differentiable manner directly from quantum chemical calculations or condensed-phase data. In this paper, we extend the Espaloma end-to-end differentiable force field construction approach by incorporating both energy and force fitting directly to quantum chemical data into the training process. Building on the OpenMM SPICE dataset, we curate a dataset containing chemical spaces highly relevant to the broad interest of biomolecular modeling, covering small molecules, proteins, and RNA. The resulting force field, espaloma 0.3.0, self-consistently parametrizes these diverse biomolecular species, accurately predicts quantum chemical energies and forces, and maintains stable quantum chemical energy-minimized geometries. Surprisingly, this simple approach produces highly accurate protein-ligand binding free energies when self-consistently parametrizing protein and ligand. This approach -- capable of fitting new force fields to large quantum chemical datasets in one GPU-day -- shows significant promise as a path forward for building systematically more accurate force fields that can be easily extended to new chemical domains of interest

arXiv.org e-Print Archive

Computational prediction, experiment design and statistical validations of non-coding regulatory RNA

Author: Cho Hyejin
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2015
Field of study

Non-coding regulatory RNAs (ncRNAs) regulate a host of gene functions in prokaryotes, e.g., transcription and translation regulations, RNA processing and modification, and mRNA stability. Some ncRNAs have been identified experimentally, but many are yet to be found. ncRNAs can be classified as either cis- or trans-acting. cis-ncRNAs perfectly complement their target genes and are usually encoded on the anti-sense strands of the targets. On the contrary, trans-ncRNAs regulate their target genes through short and often imperfect base-pairings with the targets, and are usually encoded elsewhere on the genome. A whole-genome thermodynamic analysis can be performed to identify all imperfect but stable base-pairings between all annotated genes and some genomic regions encoding ncRNAs from the same species. However, the sizes of these base-paring regions are short and variable, and their melting temperatures vary greatly between perfectly and imperfectly matched targets. It is difficult to predict trans-acting ncRNAs solely based on the thermodynamic analysis. Therefore, we also have to consider known ncRNA structures to improve our predictions. We find that Hfq-binding ncRNAs, which require Hfq protein to function, share three common structural properties. We predict these special ncRNAs in E. coli and Agrobacterium tumefaciens according to a systematic, novel 5-step approach based on thermodynamic analyses as well as known structural properties of this class of ncRNAs. Whole genome tiling microarrays are chosen to validate our predictions. We describe how the microarrays have been designed, created, and validated for E. coli MG1655 and Agrobacterium tumefaciens C58. We match our new ncRNA prediction results with known ncRNAs, calculate correlation coefficient values between each ncRNA candidate and their predicted targets measure by the whole-genome tiling microarrays, and confirm the results with 3 other ncRNA identification software tools. We also perform a gene ontology network analysis to reveal the associations of ncRNA candidates and their predicted targets. Our novel 5-step prediction method is generally applicable to other prokaryote species and may help advance ncRNA research in prokaryotes

Digital Repository @ Iowa State University (ISU)