1,456 research outputs found
Exact Learning of RNA Energy Parameters From Structure
We consider the problem of exact learning of parameters of a linear RNA
energy model from secondary structure data. A necessary and sufficient
condition for learnability of parameters is derived, which is based on
computing the convex hull of union of translated Newton polytopes of input
sequences. The set of learned energy parameters is characterized as the convex
cone generated by the normal vectors to those facets of the resulting polytope
that are incident to the origin. In practice, the sufficient condition may not
be satisfied by the entire training data set; hence, computing a maximal subset
of training data for which the sufficient condition is satisfied is often
desired. We show that problem is NP-hard in general for an arbitrary
dimensional feature space. Using a randomized greedy algorithm, we select a
subset of RNA STRAND v2.0 database that satisfies the sufficient condition for
separate A-U, C-G, G-U base pair counting model. The set of learned energy
parameters includes experimentally measured energies of A-U, C-G, and G-U
pairs; hence, our parameter set is in agreement with the Turner parameters
The RNA Newton Polytope and Learnability of Energy Parameters
Despite nearly two scores of research on RNA secondary structure and RNA-RNA
interaction prediction, the accuracy of the state-of-the-art algorithms are
still far from satisfactory. Researchers have proposed increasingly complex
energy models and improved parameter estimation methods in anticipation of
endowing their methods with enough power to solve the problem. The output has
disappointingly been only modest improvements, not matching the expectations.
Even recent massively featured machine learning approaches were not able to
break the barrier. In this paper, we introduce the notion of learnability of
the parameters of an energy model as a measure of its inherent capability. We
say that the parameters of an energy model are learnable iff there exists at
least one set of such parameters that renders every known RNA structure to date
the minimum free energy structure. We derive a necessary condition for the
learnability and give a dynamic programming algorithm to assess it. Our
algorithm computes the convex hull of the feature vectors of all feasible
structures in the ensemble of a given input sequence. Interestingly, that
convex hull coincides with the Newton polytope of the partition function as a
polynomial in energy parameters. We demonstrated the application of our theory
to a simple energy model consisting of a weighted count of A-U and C-G base
pairs. Our results show that this simple energy model satisfies the necessary
condition for less than one third of the input unpseudoknotted
sequence-structure pairs chosen from the RNA STRAND v2.0 database. For another
one third, the necessary condition is barely violated, which suggests that
augmenting this simple energy model with more features such as the Turner loops
may solve the problem. The necessary condition is severely violated for 8%,
which provides a small set of hard cases that require further investigation
DeepCare: A Deep Dynamic Memory Model for Predictive Medicine
Personalized predictive medicine necessitates the modeling of patient illness
and care processes, which inherently have long-term temporal dependencies.
Healthcare observations, recorded in electronic medical records, are episodic
and irregular in time. We introduce DeepCare, an end-to-end deep dynamic neural
network that reads medical records, stores previous illness history, infers
current illness states and predicts future medical outcomes. At the data level,
DeepCare represents care episodes as vectors in space, models patient health
state trajectories through explicit memory of historical records. Built on Long
Short-Term Memory (LSTM), DeepCare introduces time parameterizations to handle
irregular timed events by moderating the forgetting and consolidation of memory
cells. DeepCare also incorporates medical interventions that change the course
of illness and shape future medical risk. Moving up to the health state level,
historical and present health states are then aggregated through multiscale
temporal pooling, before passing through a neural network that estimates future
outcomes. We demonstrate the efficacy of DeepCare for disease progression
modeling, intervention recommendation, and future risk prediction. On two
important cohorts with heavy social and economic burden -- diabetes and mental
health -- the results show improved modeling and risk prediction accuracy.Comment: Accepted at JBI under the new name: "Predicting healthcare
trajectories from medical records: A deep learning approach
Recommended from our members
Modeling RNA, protein, and synthetic molecules using coarse-grained and all-atom representations
The aim of computational chemistry is to depict and understand the dynamics and interactions of molecular systems. In addition to increased comprehension in the physical and life sciences, this insight yields important applications to therapeutic design and materials science. In computational chemistry, molecules can be modeled in a number of representations depending on the molecular system and phenomena of interest. In this work, both simplified, coarse-grained representations and all-atom representations are used to model the interactions of RNA, cucurbituril host-guest chemistry, and cadmium selenide quantum dot binding to the Src homology 3 domain.
For RNA, a coarse-grained model was developed termed RACER (RnA CoarsE-gRained) to accurately predict RNA structure and folding free energy. After optimization to statistical potentials, RACER accurately predicted the structures of 14 RNAs with an average 4.15Å root mean square deviation (RMSD) to the experimental structure. Further, RACER captured the sequence-specific variation in folding free energy for a set of 6 RNA hairpins and 5 RNA duplexes, with a R² correlation of 0.96 to experiment.
The binding free energies of a cucurbituril host with 14 guests were computed using a polarizable force field and the free energy techniques of Bennett acceptance ratio and the orthogonal space random walk. The polarizable force field captured binding accurately, yet unexpectedly, the orthogonal space random walk method converged slowly, albeit at still reduced computational expense to the Bennett acceptance ratio.
Lastly, the nanotoxicity effects of trioctylphosphine oxide coated cadmium selenide quantum dots are investigated with the model Src homology 3 protein domain in complex with its native proline rich motif ligand. With increasing quantum dot concentration, there is an increasing preference for the quantum dots to bind to the proline rich motif active site, inhibiting Src homology 3 function.Biomedical Engineerin
RFold: RNA Secondary Structure Prediction with Decoupled Optimization
The secondary structure of ribonucleic acid (RNA) is more stable and
accessible in the cell than its tertiary structure, making it essential for
functional prediction. Although deep learning has shown promising results in
this field, current methods suffer from poor generalization and high
complexity. In this work, we present RFold, a simple yet effective RNA
secondary structure prediction in an end-to-end manner. RFold introduces a
decoupled optimization process that decomposes the vanilla constraint
satisfaction problem into row-wise and column-wise optimization, simplifying
the solving process while guaranteeing the validity of the output. Moreover,
RFold adopts attention maps as informative representations instead of designing
hand-crafted features. Extensive experiments demonstrate that RFold achieves
competitive performance and about eight times faster inference efficiency than
the state-of-the-art method. The code and Colab demo are available in
\href{http://github.com/A4Bio/RFold}{http://github.com/A4Bio/RFold}
The Rna Newton Polytope And Learnability Of Energy Parameters
Computational RNA secondary structure prediction has been a topic of much research interest for several decades now. Despite all the progress made in the field, even the state-of-the-art algorithms do not provide satisfying results, and the accuracy of output is limited for all the existent tools. Very complex energy models, different parameter estimation methods, and recent machine learning approaches had not been the answer for this problem. We believe that the first step to achieve results with high quality is to use the energy model with the potential for predicting accurate output. Hence, it is necessary to have a systematic way to analyze the suitability of an energy model. We introduced the notion of learnability to measure this suitability. A learnable energy model has at least one subset of parameters that can render every known RNA to date the minimum free energy structure, which means 100% accuracy. We also found the necessary condition for a model to be learnable and implemented the dynamic programming based algorithm to asses this condition for a set of RNAs. This algorithm computes the convex hull of all possible feature vectors for a sequence. With the partition function as a polynomial, this convex hull is also the Newton polytope of the partition function. To the best of our knowledge, this is the first systematic approach for evaluating the inherent capability of an energy model
Espaloma-0.3.0: Machine-learned molecular mechanics force field for the simulation of protein-ligand systems and beyond
Molecular mechanics (MM) force fields -- the models that characterize the
energy landscape of molecular systems via simple pairwise and polynomial terms
-- have traditionally relied on human expert-curated, inflexible, and poorly
extensible discrete chemical parameter assignment rules, namely atom or valence
types. Recently, there has been significant interest in using graph neural
networks to replace this process, while enabling the parametrization scheme to
be learned in an end-to-end differentiable manner directly from quantum
chemical calculations or condensed-phase data. In this paper, we extend the
Espaloma end-to-end differentiable force field construction approach by
incorporating both energy and force fitting directly to quantum chemical data
into the training process. Building on the OpenMM SPICE dataset, we curate a
dataset containing chemical spaces highly relevant to the broad interest of
biomolecular modeling, covering small molecules, proteins, and RNA. The
resulting force field, espaloma 0.3.0, self-consistently parametrizes these
diverse biomolecular species, accurately predicts quantum chemical energies and
forces, and maintains stable quantum chemical energy-minimized geometries.
Surprisingly, this simple approach produces highly accurate protein-ligand
binding free energies when self-consistently parametrizing protein and ligand.
This approach -- capable of fitting new force fields to large quantum chemical
datasets in one GPU-day -- shows significant promise as a path forward for
building systematically more accurate force fields that can be easily extended
to new chemical domains of interest
Computational prediction, experiment design and statistical validations of non-coding regulatory RNA
Non-coding regulatory RNAs (ncRNAs) regulate a host of gene functions in prokaryotes, e.g., transcription and translation regulations, RNA processing and modification, and mRNA stability. Some ncRNAs have been identified experimentally, but many are yet to be found. ncRNAs can be classified as either cis- or trans-acting. cis-ncRNAs perfectly complement their target genes and are usually encoded on the anti-sense strands of the targets. On the contrary, trans-ncRNAs regulate their target genes through short and often imperfect base-pairings with the targets, and are usually encoded elsewhere on the genome.
A whole-genome thermodynamic analysis can be performed to identify all imperfect but stable base-pairings between all annotated genes and some genomic regions encoding ncRNAs from the same species. However, the sizes of these base-paring regions are short and variable, and their melting temperatures vary greatly between perfectly and imperfectly matched targets. It is difficult to predict trans-acting ncRNAs solely based on the thermodynamic analysis. Therefore, we also have to consider known ncRNA structures to improve our predictions. We find that Hfq-binding ncRNAs, which require Hfq protein to function, share three common structural properties. We predict these special ncRNAs in E. coli and Agrobacterium tumefaciens according to a systematic, novel 5-step approach based on thermodynamic analyses as well as known structural properties of this class of ncRNAs.
Whole genome tiling microarrays are chosen to validate our predictions. We describe how the microarrays have been designed, created, and validated for E. coli MG1655 and Agrobacterium tumefaciens C58. We match our new ncRNA prediction results with known ncRNAs, calculate correlation coefficient values between each ncRNA candidate and their predicted targets measure by the whole-genome tiling microarrays, and confirm the results with 3 other ncRNA identification software tools. We also perform a gene ontology network analysis to reveal the associations of ncRNA candidates and their predicted targets. Our novel 5-step prediction method is generally applicable to other prokaryote species and may help advance ncRNA research in prokaryotes
- …