Search CORE

8 research outputs found

Towards high performance computing for molecular structure prediction using IBM Cell Broadband Engine - an implementation perspective

Author: Krishnan SPT
Liang Sim Sze
Veeravalli Bharadwaj
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background RNA structure prediction problem is a computationally complex task, especially with pseudo-knots. The problem is well-studied in existing literature and predominantly uses highly coupled Dynamic Programming (DP) solutions. The problem scale and complexity become embarrassingly humungous to handle as sequence size increases. This makes the case for parallelization. Parallelization can be achieved by way of networked platforms (clusters, grids, etc) as well as using modern day multi-core chips. Methods In this paper, we exploit the parallelism capabilities of the IBM Cell Broadband Engine to parallelize an existing Dynamic Programming (DP) algorithm for RNA secondary structure prediction. We design three different implementation strategies that exploit the inherent data, code and/or hybrid parallelism, referred to as C-Par, D-Par and H-Par, and analyze their performances. Our approach attempts to introduce parallelism in critical sections of the algorithm. We ran our experiments on SONY Play Station 3 (PS3), which is based on the IBM Cell chip. Results Our results suggest that introducing parallelism in DP algorithm allows it to easily handle longer sequences which otherwise would consume a large amount of time in single core computers. The results further demonstrate the speed-up gain achieved in exploiting the inherent parallelism in the problem and also elicits the advantages of using multi-core platforms towards designing more sophisticated methodologies for handling a fairly long sequence of RNA. Conclusion The speed-up performance reported here is promising, especially when sequence length is long. To the best of our literature survey, the work reported in this paper is probably the first-of-its-kind to utilize the IBM Cell Broadband Engine (a heterogeneous multi-core chip) to implement a DP. The results also encourage using multi-core platforms towards designing more sophisticated methodologies for handling a fairly long sequence of RNA to predict its secondary structure.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

ScholarBank@NUS

Towards high performance computing for molecular structure prediction using IBM Cell Broadband Engine - an implementation perspective

Author: Beck
Behrens
Bianconi
Blasco
Blasco
Corma
Corma
Franke
Greegor
Grubert
Hagen
Herrmann
Iengo
Klaas
Liu
Maschmeyer
Maschmeyer
Waychunas
Wei
Publication venue: BioMed Central
Publication date: 01/01/2000
Field of study

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Online Research Database In Technology

ScholarBank@NUS

Sparsification of RNA structure prediction including pseudoknots

Author: Backofen Rolf
Möhl Mathias
Sahinalp S Cenk
Salari Raheleh
Will Sebastian
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Although many RNA molecules contain pseudoknots, computational prediction of pseudoknotted RNA structure is still in its infancy due to high running time and space consumption implied by the dynamic programming formulations of the problem. Results In this paper, we introduce sparsification to significantly speedup the dynamic programming approaches for pseudoknotted RNA structure prediction, which also lower the space requirements. Although sparsification has been applied to a number of RNA-related structure prediction problems in the past few years, we provide the first application of sparsification to pseudoknotted RNA structure prediction specifically and to handling gapped fragments more generally - which has a much more complex recursive structure than other problems to which sparsification has been applied. We analyse how to sparsify four pseudoknot structure prediction algorithms, among those the most general method available (the Rivas-Eddy algorithm) and the fastest one (Reeder-Giegerich algorithm). In all algorithms the number of "candidate" substructures to be considered is reduced. Conclusions Our experimental results on the sparsified Reeder-Giegerich algorithm suggest a linear speedup over the unsparsified implementation.</p

Crossref

DSpace@MIT

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics

Author: Giegerich Robert
Reeder Jens
Publication venue: BioMed Central
Publication date: 01/01/2004
Field of study

BACKGROUND: The general problem of RNA secondary structure prediction under the widely used thermodynamic model is known to be NP-complete when the structures considered include arbitrary pseudoknots. For restricted classes of pseudoknots, several polynomial time algorithms have been designed, where the O(n(6))time and O(n(4)) space algorithm by Rivas and Eddy is currently the best available program. RESULTS: We introduce the class of canonical simple recursive pseudoknots and present an algorithm that requires O(n(4)) time and O(n(2)) space to predict the energetically optimal structure of an RNA sequence, possible containing such pseudoknots. Evaluation against a large collection of known pseudoknotted structures shows the adequacy of the canonization approach and our algorithm. CONCLUSIONS: RNA pseudoknots of medium size can now be predicted reliably as well as efficiently by the new algorithm

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Publications at Bielefeld University

SimulFold: Simultaneously Inferring RNA Structures Including Pseudoknots, Alignments, and Trees Using a Bayesian MCMC Framework

Author: Meyer Irmtraud M
Miklós István
Publication venue: Public Library of Science
Publication date: 01/08/2007
Field of study

Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses

Public Library of Science (PLOS)

SZTAKI Publication Repository

Directory of Open Access Journals

PubMed Central

MDC Repository

Molekulare Struktur- und Funktionsanalyse der Transkriptionskontrolle der GeneDDX3X und DDX3Y in der männlichen Keimbahn

Author: Rauschendorf Marc-Alexander
Publication venue
Publication date: 01/01/2011
Field of study

Das Y-chromosomale Gen DDX3Y und das X-homologe Gen DDX3X kodieren zwei RNA-Helikasen der DEAD-Box Familie, die beide funktionell in verschiedenen Phasen der Human Spermatogenese aktiv sind (Ditton et al., 2004). Deletionen der „Azoospermia Factor a“ (AZFa) Region in Yq11, in der DDX3Y lokalisiert ist, führen zu einem Totalverlust der männlichen Keimzellen, dem „Sertoli Cell-only“ (SCO)-Syndrom. Beide Gene weisen in den kodierenden Sequenzen eine hohe Konserviertheit (92,4%) ihrer Aminosäure-Sequenzen auf, was auf eine funktionelle Selektion beider Genkopien hindeutet. Die Promoter- und 5´UTR-Sequenzen haben dagegen, seit Fehlen der Rekombination der Säuger Gonosomen, deutliche Chromosomen-spezifische Sequenzveränderungen durchlaufen. Diese Veränderungen haben zur Entstehung einer komplexen hodenspezifischen Transkriptions- und Translationskontrolle beider Gene geführt. Auf Grund der Allel-spezifischen Sequenzevolution sind unterschiedliche Core-Promotermodule zur Keimbahn-spezifischen Expressionskontrolle etabliert worden. Einige Sequenzmotive geben auch erste Hinweise auf unterschiedliche Chromatinstrukturen der beiden Promoterdomänen. Durch die Kombination von vergleichender Genomik in sechs Säugerspezies (Mensch, Schimpanse, Rhesusaffe, Weißbüschelaffe, Rind, Maus) für beide Gene und gezielten Experimenten, konnten sowohl Allel-spezifische, als auch Spezies-spezifische cis-regulative Module identifiziert werden. So konnten in den Human DDX3(X/Y) Promoterregionen neun konservierte Sequenzblöcke kartiert werden. Ein solcher Sequenzblock ist der Y-spezifische MSY2 Minisatellit (Bao et al., 2000). Eine MSY2, bzw. homologe MSY2-X Basissequenz konnte in allen Spezies für DDX3Y und DDX3X stromaufwärts zu den Transkriptionseinheiten identifiziert werden. Eine Vervielfältigung der MSY2 Sequenz erfolgte allerdings nur in Primaten und nur in der Keimzell-spezifischen Promoterdomäne von DDX3Y. In den neun Human X-Y Sequenzblöcken wurden 24 X-Y konservierte Transkriptionsfaktorbindestellen (TFBS) identifiziert. Besonders auffällig ist eine in allen untersuchten Spezies X-Y konservierte SOX5-TFBS, die in den MSY2 und MSY2-X Sequenzen lokalisiert ist. Insgesamt konnten 30 X-Y konservierte TFBS in den Human Promotersequenzen kartiert werden. Die Mehrzahl der dazugehörigen TFs weist eine Expression in Hodengewebe auf. Sechs gemeinsame TF-Module konnten identifiziert werden, wovon eines positionshomolog lokalisiert ist. Die konservierten TFBS und gemeinsamen TF-Module deuten somit auf einen gewissen Grad von Co-Regulation der beiden Gene in der männlichen Keimbahn hin. Daneben konnten zusätzlich mehrere Allel-spezifisch repetitive TFBS und TFBS-Cluster kartiert werden, die offensichtlich eine genspezifische Transkriptionskontrolle vermitteln. Die Kombination aus in-silico Analysen und gezielten Luziferase-Reportergenanalysen demonstriert somit eine erfolgreiche Strategie zur Identifizierung wesentlicher cis-Kontrollelemente für die DDX3(X/Y) Keimbahnexpression. Diese cis-Module sind daher gute Sequenzmotive für die molekulargenetische Mutationsanalyse bei infertilen Männern mit Verdacht auf Fehlfunktion der DDX3(X/Y) Expression

Heidelberger Dokumentenserver

Efficient Cache-oblivious String Algorithms for Bioinformatics ∗

Author: Chowdhury Hai-son
Le Vijaya Ramachandran
Rezaul Alam
Publication venue
Publication date
Field of study

We present theoretical and experimental results on cache-efficient and parallel algorithms for some well-studied string problems in bioinformatics: 1. Pairwise alignment. Optimal pairwise global sequence alignment using affine gap penalty; 2. Median. Optimal alignment of three sequences using affine gap penalty; 3. RNA secondary structure prediction. Maximizing number of base pairs in RNA secondary structure with simple pseudoknots. For each of these three problems we present cache-oblivious algorithms that match the best-known time complexity, match or improve the best-known space complexity, and improve significantly over the cache-efficiency of earlier algorithms. We also show that these algorithms are easily parallelizable, and we analyze their parallel performance. We present experimental results that show that all three cache-oblivious algorithms run faster on current desktop machines than the best previous algorithm for the problem. For the first two problems we compare our code to publicly available software written by others, and for the last problem our comparison is to our implementation of Akutsu’s algorithm. We also include empirical results showing good performance for the parallel implementations of our algorithms for the first two problems. Our methods are applicable to several other dynamic programs for string problems in bioinformatics including local alignment, generalized global alignment with intermittent similarities, multiple sequence alignment under several scoring functions such as ‘sum-of-pairs ’ objective function and RNA secondary structure prediction with simple pseudoknots using energy functions based on adjacent base pairs

CiteSeerX