Search CORE

349 research outputs found

Modeling and predicting all-α transmembrane proteins including helix–helix pairing

Author: Steyaert Jean-Marc
Waldispühl Jérôme
Publication venue: Elsevier B.V.
Publication date
Field of study

AbstractModeling and predicting the structure of proteins is one of the most important challenges of computational biology. Exact physical models are too complex to provide feasible prediction tools and other ab initio methods only use local and probabilistic information to fold a given sequence. We show in this paper that all-α transmembrane protein secondary and super-secondary structures can be modeled with a multi-tape S-attributed grammar. An efficient structure prediction algorithm using both local and global constraints is designed and evaluated. Comparison with existing methods shows that the prediction rates as well as the definition level are sensibly increased. Furthermore this approach can be generalized to more complex proteins

Elsevier - Publisher Connector

GPU accelerated RNA-RNA interaction program, A

Author: Gildemaster Brandon
Publication venue: Colorado State University. Libraries
Publication date: 01/01/2021
Field of study

2021 Spring.Includes bibliographical references.RNA-RNA interaction (RRI) is important in processes like gene regulation, and is known to play roles in diseases including cancer and Alzheimer's. Large RRI computations run for days, weeks or even months, because the algorithms have time and space complexity of, respectively, O(N3M3) and O(N2M2), for sequences length N and M, and there is a need for high-throughput RRI tools. GPU parallelization of such algorithms is a challenge. We first show that the most computationally expensive part of base pair maximization (BPM) algorithms comprises O(N3) instances of upper banded tropical matrix products. We develop the first GPU library for this attaining close to theoretical machine peak (TMP). We next optimize other (fifth degree polynomial) terms in the computation and develop the first GPU implementation of the complete BPMax algorithm. We attain 12% of GPU TMP, a significant speedup over the original parallel CPU implementation, which attains less than 1% of CPU TMP. We also perform a large scale study of three small viral RNAs, hypothesized to be relevant to COVID-19

Mountain Scholar (Digital Collections of Colorado and Wyoming)

DynaProg for Scala

Author: Coppey Thierry
Publication venue
Publication date: 01/11/2013
Field of study

Dynamic programming is an algorithmic technique to solve problems that follow the Bellman’s principle: optimal solutions depends on optimal sub-problem solutions. The core idea behind dynamic programming is to memoize intermediate results into matrices to avoid multiple computations. Solving a dynamic programming problem consists of two phases: filling one or more matrices with intermediate solutions for sub-problems and recomposing how the final result was constructed (backtracking). In textbooks, problems are usually described in terms of recurrence relations between matrices elements. Expressing dynamic programming problems in terms of recursive formulae involving matrix indices might be difficult, if often error prone, and the notation does not capture the essence of the underlying problem (for example aligning two sequences). Moreover, writing correct and efficient parallel implementation requires different competencies and often a significant amount of time. In this project, we present DynaProg, a language embedded in Scala (DSL) to address dynamic programming problems on heterogeneous platforms. DynaProg allows the programmer to write concise programs based on ADP [1], using a pair of parsing grammar and algebra; these program can then be executed either on CPU or on GPU. We evaluate the performance of our implementation against existing work and our own hand-optimized baseline implementations for both the CPU and GPU versions. Experimental results show that plain Scala has a large overhead and is recommended to be used with small sequences (≤1024) whereas the generated GPU version is comparable with existing implementations: matrix chain multiplication has the same performance as our hand-optimized version (142% of the execution time of [2]) for a sequence of 4096 matrices, Smith-Waterman is twice slower than [3] on a pair of sequences of 6144 elements, and RNA folding is on par with [4] (95% running time) for sequences of 4096 elements. [1] Robert Giegerich and Carsten Meyer. Algebraic Dynamic Programming. [2] Chao-Chin Wu, Jenn-Yang Ke, Heshan Lin and Wu Chun Feng. Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism. [3] Edans Flavius de O. Sandes, Alba Cristina M. A. de Melo. Smith-Waterman alignment of huge sequences with GPU in linear space. [4] Guillaume Rizk and Dominique Lavenier. GPU accelerated RNA folding algorithm

Infoscience - École polytechnique fédérale de Lausanne

Thermodynamics of RNA structures by Wang–Landau sampling

Author: Abrashams
Bekaert
Bernhart
Bradley
B ck
Cheah
Chen
Clote
Danilova
Dimitrov
Dirks
Eddy
F. Lou
Flamm
Flamm
Griffiths-Jones
Hofacker
Kirkpatrick
Knudsen
Kou
Lim
Lyngs
Mandal
Markham
Metzler
Nussinov
Omer
Ortiz
P. Clote
Reeder
Reinisch
REN
Rivas
Tabaska
Tucker
van Batenburg
Wang
Weinger
Wuchty
Xayaphoummine
Zhang
Zhao
Zuker
Publication venue: Oxford University Press
Publication date: 01/01/2010
Field of study

Motivation: Thermodynamics-based dynamic programming RNA secondary structure algorithms have been of immense importance in molecular biology, where applications range from the detection of novel selenoproteins using expressed sequence tag (EST) data, to the determination of microRNA genes and their targets. Dynamic programming algorithms have been developed to compute the minimum free energy secondary structure and partition function of a given RNA sequence, the minimum free-energy and partition function for the hybridization of two RNA molecules, etc. However, the applicability of dynamic programming methods depends on disallowing certain types of interactions (pseudoknots, zig-zags, etc.), as their inclusion renders structure prediction an nondeterministic polynomial time (NP)-complete problem. Nevertheless, such interactions have been observed in X-ray structures

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

PubMed Central

HAL-Polytechnique

HAL-Rennes 1

Parallelization of dynamic programming recurrences in computational biology

Author: Jacob Arpith
Publication venue: Washington University Open Scholarship
Publication date: 01/01/2010
Field of study

The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

Washington University St. Louis: Open Scholarship

A probabilistic model for the evolution of RNA structure

Author: Holmes Ian
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: For the purposes of finding and aligning noncoding RNA gene- and cis-regulatory elements in multiple-genome datasets, it is useful to be able to derive multi-sequence stochastic grammars (and hence multiple alignment algorithms) systematically, starting from hypotheses about the various kinds of random mutation event and their rates. RESULTS: Here, we consider a highly simplified evolutionary model for RNA, called "The TKF91 Structure Tree" (following Thorne, Kishino and Felsenstein's 1991 model of sequence evolution with indels), which we have implemented for pairwise alignment as proof of principle for such an approach. The model, its strengths and its weaknesses are discussed with reference to four examples of functional ncRNA sequences: a riboswitch (guanine), a zipcode (nanos), a splicing factor (U4) and a ribozyme (RNase P). As shown by our visualisations of posterior probability matrices, the selected examples illustrate three different signatures of natural selection that are highly characteristic of ncRNA: (i) co-ordinated basepair substitutions, (ii) co-ordinated basepair indels and (iii) whole-stem indels. CONCLUSIONS: Although all three types of mutation "event" are built into our model, events of type (i) and (ii) are found to be better modeled than events of type (iii). Nevertheless, we hypothesise from the model's performance on pairwise alignments that it would form an adequate basis for a prototype multiple alignment and genefinding tool

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Oxford University Research Archive

Can Clustal-style progressive pairwise alignment of multiple sequences be used in RNA secondary structure prediction?

Author: Amelia B Bellamy-Royds
B Masoumi
D Gautheret
D Mathews
D Sankoff
DH Mathews
DH Mathews
G Pavesi
G Storz
IL Hofacker
IL Hofacker
J Felsenstein
J Gorodkin
JD Thompson
JH Havgaard
JJ Cannone
KJ Doshi
M Anwar
M Sprinzl
M Zuker
M Zuker
M Zuker
Marcel Turcotte
P Rice
PP Gardner
PP Gardner
R Gutell
RD Dowell
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background In ribonucleic acid (RNA) molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure, defined by the set of internal basepair interactions, is more consistently conserved than the primary structure, defined by the sequence of nucleotides. Results The research presented here investigates the possibility of applying a progressive, pairwise approach to the alignment of multiple RNA sequences by simultaneously predicting an energy-optimized consensus secondary structure. We take an existing algorithm for finding the secondary structure common to two RNA sequences, Dynalign, and alter it to align profiles of multiple sequences. We then explore the relative successes of different approaches to designing the tree that will guide progressive alignments of sequence profiles to create a multiple alignment and prediction of conserved structure. Conclusion We have found that applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid sequences produces highly reliable predictions of conserved basepairs, and we have shown how these predictions can be used as constraints to improve the results of a single-sequence structure prediction algorithm. However, we have also discovered that the amount of detail included in a consensus structure prediction is highly dependent on the order in which sequences are added to the alignment (the guide tree), and that if a consensus structure does not have sufficient detail, it is less likely to provide useful constraints for the single-sequence method.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Developing and applying heterogeneous phylogenetic models with XRate

Author: A Heger
A Siepel
A Varadarajan
AJ Drummond
B Knudsen
B Knudsen
Christos A. Ouzounis
D Ayres
DB Searls
E Birney
G Lunter
GSC Slater
Ian Holmes
IM Meyer
J Felsenstein
J Goecks
J Watts
JS Pedersen
L Stein
M Garber
M Hasegawa
M Kimura
M Zuker
ME Skinner
N Saitou
O Penn
Oscar Westesson
PS Klosterman
RK Bradley
SR Eddy
TH Jukes
WJ Kent
Z Yang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/02/2012
Field of study

Modeling sequence evolution on phylogenetic trees is a useful technique in computational biology. Especially powerful are models which take account of the heterogeneous nature of sequence evolution according to the "grammar" of the encoded gene features. However, beyond a modest level of model complexity, manual coding of models becomes prohibitively labor-intensive. We demonstrate, via a set of case studies, the new built-in model-prototyping capabilities of XRate (macros and Scheme extensions). These features allow rapid implementation of phylogenetic models which would have previously been far more labor-intensive. XRate's new capabilities for lineage-specific models, ancestral sequence reconstruction, and improved annotation output are also discussed. XRate's flexible model-specification capabilities and computational efficiency make it well-suited to developing and prototyping phylogenetic grammar models. XRate is available as part of the DART software package: http://biowiki.org/DART .Comment: 34 pages, 3 figures, glossary of XRate model terminolog

arXiv.org e-Print Archive

Crossref

PubMed Central

FigShare

Modelling biomolecules through atomistic graphs: theory, implementation, and applications

Author: Song Florian J
Publication venue: Chemistry, Imperial College London
Publication date: 01/08/2022
Field of study

Describing biological molecules through computational models enjoys ever-growing popularity. Never before has access to computational resources been easier for scientists across the natural sciences. The need for accurate, efficient, and robust modelling tools is therefore irrefutable. This, in turn, calls for highly interdisciplinary research, which the thesis presented here is a product of. Through the successful marriage of techniques from mathematical graph theory, theoretical insights from chemistry and biology, and the tools of modern computer science, we are able to computationally construct accurate depictions of biomolecules as atomistic graphs, in which individual atoms become nodes and chemical bonds/interactions are represented by weighted edges. When combined with methods from graph theory and network science, this approach has previously been shown to successfully reveal various properties of proteins, such as dynamics, rigidity, multi-scale organisation, allostery, and protein-protein interactions, and is well poised to set new standards in terms of computational feasibility, multi-scale resolution (from atoms to domains) and time-scales (from nanoseconds to milliseconds). Therefore, building on previous work in our research group spanning over 15 years and to further encourage and facilitate research into this growing field, this thesis's main contribution is to provide a formalised foundation for the construction of atomistic graphs. The most crucial aspect of constructing atomistic graphs of large biomolecules compared to small molecules is the necessity to include a variety of different types of bonds and interactions, because larger biomolecules attain their unique structural layout mainly through weaker interactions, e.g. hydrogen bonds, the hydrophobic effect or π-π interactions. Whilst most interaction types are well-studied and have readily available methodology which can be used to construct atomistic graphs, this is not the case for hydrophobic interactions. To fill this gap, the work presented herein includes novel methodology for encoding the hydrophobic effect in atomistic graphs, that accounts for the many-body effect and non-additivity. Then, a standalone software package for constructing atomistic graphs from structural data is presented. Herein lies the heart of this thesis: the combination of a variety of methodologies for a range of bond/interaction types, as well as an implementation that is deterministic, easy-to-use and efficient. Finally, some promising avenues for utilising atomistic graphs in combination with graph theoretical tools such as Markov Stability as well as other approaches such as Multilayer Networks to study various properties of biomolecules are presented.Open Acces

Spiral - Imperial College Digital Repository