349 research outputs found

    Modeling and predicting all-α transmembrane proteins including helix–helix pairing

    Get PDF
    AbstractModeling and predicting the structure of proteins is one of the most important challenges of computational biology. Exact physical models are too complex to provide feasible prediction tools and other ab initio methods only use local and probabilistic information to fold a given sequence. We show in this paper that all-α transmembrane protein secondary and super-secondary structures can be modeled with a multi-tape S-attributed grammar. An efficient structure prediction algorithm using both local and global constraints is designed and evaluated. Comparison with existing methods shows that the prediction rates as well as the definition level are sensibly increased. Furthermore this approach can be generalized to more complex proteins

    GPU accelerated RNA-RNA interaction program, A

    Get PDF
    2021 Spring.Includes bibliographical references.RNA-RNA interaction (RRI) is important in processes like gene regulation, and is known to play roles in diseases including cancer and Alzheimer's. Large RRI computations run for days, weeks or even months, because the algorithms have time and space complexity of, respectively, O(N3M3) and O(N2M2), for sequences length N and M, and there is a need for high-throughput RRI tools. GPU parallelization of such algorithms is a challenge. We first show that the most computationally expensive part of base pair maximization (BPM) algorithms comprises O(N3) instances of upper banded tropical matrix products. We develop the first GPU library for this attaining close to theoretical machine peak (TMP). We next optimize other (fifth degree polynomial) terms in the computation and develop the first GPU implementation of the complete BPMax algorithm. We attain 12% of GPU TMP, a significant speedup over the original parallel CPU implementation, which attains less than 1% of CPU TMP. We also perform a large scale study of three small viral RNAs, hypothesized to be relevant to COVID-19

    DynaProg for Scala

    Get PDF
    Dynamic programming is an algorithmic technique to solve problems that follow the Bellman’s principle: optimal solutions depends on optimal sub-problem solutions. The core idea behind dynamic programming is to memoize intermediate results into matrices to avoid multiple computations. Solving a dynamic programming problem consists of two phases: filling one or more matrices with intermediate solutions for sub-problems and recomposing how the final result was constructed (backtracking). In textbooks, problems are usually described in terms of recurrence relations between matrices elements. Expressing dynamic programming problems in terms of recursive formulae involving matrix indices might be difficult, if often error prone, and the notation does not capture the essence of the underlying problem (for example aligning two sequences). Moreover, writing correct and efficient parallel implementation requires different competencies and often a significant amount of time. In this project, we present DynaProg, a language embedded in Scala (DSL) to address dynamic programming problems on heterogeneous platforms. DynaProg allows the programmer to write concise programs based on ADP [1], using a pair of parsing grammar and algebra; these program can then be executed either on CPU or on GPU. We evaluate the performance of our implementation against existing work and our own hand-optimized baseline implementations for both the CPU and GPU versions. Experimental results show that plain Scala has a large overhead and is recommended to be used with small sequences (≤1024) whereas the generated GPU version is comparable with existing implementations: matrix chain multiplication has the same performance as our hand-optimized version (142% of the execution time of [2]) for a sequence of 4096 matrices, Smith-Waterman is twice slower than [3] on a pair of sequences of 6144 elements, and RNA folding is on par with [4] (95% running time) for sequences of 4096 elements. [1] Robert Giegerich and Carsten Meyer. Algebraic Dynamic Programming. [2] Chao-Chin Wu, Jenn-Yang Ke, Heshan Lin and Wu Chun Feng. Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism. [3] Edans Flavius de O. Sandes, Alba Cristina M. A. de Melo. Smith-Waterman alignment of huge sequences with GPU in linear space. [4] Guillaume Rizk and Dominique Lavenier. GPU accelerated RNA folding algorithm

    Thermodynamics of RNA structures by Wang–Landau sampling

    Get PDF
    Motivation: Thermodynamics-based dynamic programming RNA secondary structure algorithms have been of immense importance in molecular biology, where applications range from the detection of novel selenoproteins using expressed sequence tag (EST) data, to the determination of microRNA genes and their targets. Dynamic programming algorithms have been developed to compute the minimum free energy secondary structure and partition function of a given RNA sequence, the minimum free-energy and partition function for the hybridization of two RNA molecules, etc. However, the applicability of dynamic programming methods depends on disallowing certain types of interactions (pseudoknots, zig-zags, etc.), as their inclusion renders structure prediction an nondeterministic polynomial time (NP)-complete problem. Nevertheless, such interactions have been observed in X-ray structures

    Parallelization of dynamic programming recurrences in computational biology

    Get PDF
    The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

    A probabilistic model for the evolution of RNA structure

    Get PDF
    BACKGROUND: For the purposes of finding and aligning noncoding RNA gene- and cis-regulatory elements in multiple-genome datasets, it is useful to be able to derive multi-sequence stochastic grammars (and hence multiple alignment algorithms) systematically, starting from hypotheses about the various kinds of random mutation event and their rates. RESULTS: Here, we consider a highly simplified evolutionary model for RNA, called "The TKF91 Structure Tree" (following Thorne, Kishino and Felsenstein's 1991 model of sequence evolution with indels), which we have implemented for pairwise alignment as proof of principle for such an approach. The model, its strengths and its weaknesses are discussed with reference to four examples of functional ncRNA sequences: a riboswitch (guanine), a zipcode (nanos), a splicing factor (U4) and a ribozyme (RNase P). As shown by our visualisations of posterior probability matrices, the selected examples illustrate three different signatures of natural selection that are highly characteristic of ncRNA: (i) co-ordinated basepair substitutions, (ii) co-ordinated basepair indels and (iii) whole-stem indels. CONCLUSIONS: Although all three types of mutation "event" are built into our model, events of type (i) and (ii) are found to be better modeled than events of type (iii). Nevertheless, we hypothesise from the model's performance on pairwise alignments that it would form an adequate basis for a prototype multiple alignment and genefinding tool

    Can Clustal-style progressive pairwise alignment of multiple sequences be used in RNA secondary structure prediction?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In ribonucleic acid (RNA) molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure, defined by the set of internal basepair interactions, is more consistently conserved than the primary structure, defined by the sequence of nucleotides.</p> <p>Results</p> <p>The research presented here investigates the possibility of applying a progressive, pairwise approach to the alignment of multiple RNA sequences by simultaneously predicting an energy-optimized consensus secondary structure. We take an existing algorithm for finding the secondary structure common to two RNA sequences, Dynalign, and alter it to align profiles of multiple sequences. We then explore the relative successes of different approaches to designing the tree that will guide progressive alignments of sequence profiles to create a multiple alignment and prediction of conserved structure.</p> <p>Conclusion</p> <p>We have found that applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid sequences produces highly reliable predictions of conserved basepairs, and we have shown how these predictions can be used as constraints to improve the results of a single-sequence structure prediction algorithm. However, we have also discovered that the amount of detail included in a consensus structure prediction is highly dependent on the order in which sequences are added to the alignment (the guide tree), and that if a consensus structure does not have sufficient detail, it is less likely to provide useful constraints for the single-sequence method.</p

    Developing and applying heterogeneous phylogenetic models with XRate

    Get PDF
    Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

    Modelling biomolecules through atomistic graphs: theory, implementation, and applications

    Get PDF
    Describing biological molecules through computational models enjoys ever-growing popularity. Never before has access to computational resources been easier for scientists across the natural sciences. The need for accurate, efficient, and robust modelling tools is therefore irrefutable. This, in turn, calls for highly interdisciplinary research, which the thesis presented here is a product of. Through the successful marriage of techniques from mathematical graph theory, theoretical insights from chemistry and biology, and the tools of modern computer science, we are able to computationally construct accurate depictions of biomolecules as atomistic graphs, in which individual atoms become nodes and chemical bonds/interactions are represented by weighted edges. When combined with methods from graph theory and network science, this approach has previously been shown to successfully reveal various properties of proteins, such as dynamics, rigidity, multi-scale organisation, allostery, and protein-protein interactions, and is well poised to set new standards in terms of computational feasibility, multi-scale resolution (from atoms to domains) and time-scales (from nanoseconds to milliseconds). Therefore, building on previous work in our research group spanning over 15 years and to further encourage and facilitate research into this growing field, this thesis's main contribution is to provide a formalised foundation for the construction of atomistic graphs. The most crucial aspect of constructing atomistic graphs of large biomolecules compared to small molecules is the necessity to include a variety of different types of bonds and interactions, because larger biomolecules attain their unique structural layout mainly through weaker interactions, e.g. hydrogen bonds, the hydrophobic effect or π-π interactions. Whilst most interaction types are well-studied and have readily available methodology which can be used to construct atomistic graphs, this is not the case for hydrophobic interactions. To fill this gap, the work presented herein includes novel methodology for encoding the hydrophobic effect in atomistic graphs, that accounts for the many-body effect and non-additivity. Then, a standalone software package for constructing atomistic graphs from structural data is presented. Herein lies the heart of this thesis: the combination of a variety of methodologies for a range of bond/interaction types, as well as an implementation that is deterministic, easy-to-use and efficient. Finally, some promising avenues for utilising atomistic graphs in combination with graph theoretical tools such as Markov Stability as well as other approaches such as Multilayer Networks to study various properties of biomolecules are presented.Open Acces
    corecore