8 research outputs found

    Towards high performance computing for molecular structure prediction using IBM Cell Broadband Engine - an implementation perspective

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA structure prediction problem is a computationally complex task, especially with pseudo-knots. The problem is well-studied in existing literature and predominantly uses highly coupled Dynamic Programming (DP) solutions. The problem scale and complexity become embarrassingly humungous to handle as sequence size increases. This makes the case for parallelization. Parallelization can be achieved by way of networked platforms (clusters, grids, etc) as well as using modern day multi-core chips.</p> <p>Methods</p> <p>In this paper, we exploit the parallelism capabilities of the IBM Cell Broadband Engine to parallelize an existing Dynamic Programming (DP) algorithm for RNA secondary structure prediction. We design three different implementation strategies that exploit the inherent data, code and/or hybrid parallelism, referred to as C-Par, D-Par and H-Par, and analyze their performances. Our approach attempts to introduce parallelism in critical sections of the algorithm. We ran our experiments on SONY Play Station 3 (PS3), which is based on the IBM Cell chip.</p> <p>Results</p> <p>Our results suggest that introducing parallelism in DP algorithm allows it to easily handle longer sequences which otherwise would consume a large amount of time in single core computers. The results further demonstrate the speed-up gain achieved in exploiting the inherent parallelism in the problem and also elicits the advantages of using multi-core platforms towards designing more sophisticated methodologies for handling a fairly long sequence of RNA.</p> <p>Conclusion</p> <p>The speed-up performance reported here is promising, especially when sequence length is long. To the best of our literature survey, the work reported in this paper is probably the first-of-its-kind to utilize the IBM Cell Broadband Engine (a heterogeneous multi-core chip) to implement a DP. The results also encourage using multi-core platforms towards designing more sophisticated methodologies for handling a fairly long sequence of RNA to predict its secondary structure.</p

    Towards high performance computing for molecular structure prediction using IBM Cell Broadband Engine - an implementation perspective

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA structure prediction problem is a computationally complex task, especially with pseudo-knots. The problem is well-studied in existing literature and predominantly uses highly coupled Dynamic Programming (DP) solutions. The problem scale and complexity become embarrassingly humungous to handle as sequence size increases. This makes the case for parallelization. Parallelization can be achieved by way of networked platforms (clusters, grids, etc) as well as using modern day multi-core chips.</p> <p>Methods</p> <p>In this paper, we exploit the parallelism capabilities of the IBM Cell Broadband Engine to parallelize an existing Dynamic Programming (DP) algorithm for RNA secondary structure prediction. We design three different implementation strategies that exploit the inherent data, code and/or hybrid parallelism, referred to as C-Par, D-Par and H-Par, and analyze their performances. Our approach attempts to introduce parallelism in critical sections of the algorithm. We ran our experiments on SONY Play Station 3 (PS3), which is based on the IBM Cell chip.</p> <p>Results</p> <p>Our results suggest that introducing parallelism in DP algorithm allows it to easily handle longer sequences which otherwise would consume a large amount of time in single core computers. The results further demonstrate the speed-up gain achieved in exploiting the inherent parallelism in the problem and also elicits the advantages of using multi-core platforms towards designing more sophisticated methodologies for handling a fairly long sequence of RNA.</p> <p>Conclusion</p> <p>The speed-up performance reported here is promising, especially when sequence length is long. To the best of our literature survey, the work reported in this paper is probably the first-of-its-kind to utilize the IBM Cell Broadband Engine (a heterogeneous multi-core chip) to implement a DP. The results also encourage using multi-core platforms towards designing more sophisticated methodologies for handling a fairly long sequence of RNA to predict its secondary structure.</p

    Sparsification of RNA structure prediction including pseudoknots

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although many RNA molecules contain pseudoknots, computational prediction of pseudoknotted RNA structure is still in its infancy due to high running time and space consumption implied by the dynamic programming formulations of the problem.</p> <p>Results</p> <p>In this paper, we introduce sparsification to significantly speedup the dynamic programming approaches for pseudoknotted RNA structure prediction, which also lower the space requirements. Although sparsification has been applied to a number of RNA-related structure prediction problems in the past few years, we provide the first application of sparsification to pseudoknotted RNA structure prediction specifically and to handling gapped fragments more generally - which has a much more complex recursive structure than other problems to which sparsification has been applied. We analyse how to sparsify four pseudoknot structure prediction algorithms, among those the most general method available (the Rivas-Eddy algorithm) and the fastest one (Reeder-Giegerich algorithm). In all algorithms the number of "candidate" substructures to be considered is reduced.</p> <p>Conclusions</p> <p>Our experimental results on the sparsified Reeder-Giegerich algorithm suggest a linear speedup over the unsparsified implementation.</p

    Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics

    Get PDF
    BACKGROUND: The general problem of RNA secondary structure prediction under the widely used thermodynamic model is known to be NP-complete when the structures considered include arbitrary pseudoknots. For restricted classes of pseudoknots, several polynomial time algorithms have been designed, where the O(n(6))time and O(n(4)) space algorithm by Rivas and Eddy is currently the best available program. RESULTS: We introduce the class of canonical simple recursive pseudoknots and present an algorithm that requires O(n(4)) time and O(n(2)) space to predict the energetically optimal structure of an RNA sequence, possible containing such pseudoknots. Evaluation against a large collection of known pseudoknotted structures shows the adequacy of the canonization approach and our algorithm. CONCLUSIONS: RNA pseudoknots of medium size can now be predicted reliably as well as efficiently by the new algorithm

    SimulFold: Simultaneously Inferring RNA Structures Including Pseudoknots, Alignments, and Trees Using a Bayesian MCMC Framework

    Get PDF
    Computational methods for predicting evolutionarily conserved rather than thermodynamic RNA structures have recently attracted increased interest. These methods are indispensable not only for elucidating the regulatory roles of known RNA transcripts, but also for predicting RNA genes. It has been notoriously difficult to devise them to make the best use of the available data and to predict high-quality RNA structures that may also contain pseudoknots. We introduce a novel theoretical framework for co-estimating an RNA secondary structure including pseudoknots, a multiple sequence alignment, and an evolutionary tree, given several RNA input sequences. We also present an implementation of the framework in a new computer program, called SimulFold, which employs a Bayesian Markov chain Monte Carlo method to sample from the joint posterior distribution of RNA structures, alignments, and trees. We use the new framework to predict RNA structures, and comprehensively evaluate the quality of our predictions by comparing our results to those of several other programs. We also present preliminary data that show SimulFold's potential as an alignment and phylogeny prediction method. SimulFold overcomes many conceptual limitations that current RNA structure prediction methods face, introduces several new theoretical techniques, and generates high-quality predictions of conserved RNA structures that may include pseudoknots. It is thus likely to have a strong impact, both on the field of RNA structure prediction and on a wide range of data analyses

    Molekulare Struktur- und Funktionsanalyse der Transkriptionskontrolle der GeneDDX3X und DDX3Y in der männlichen Keimbahn

    Get PDF
    Das Y-chromosomale Gen DDX3Y und das X-homologe Gen DDX3X kodieren zwei RNA-Helikasen der DEAD-Box Familie, die beide funktionell in verschiedenen Phasen der Human Spermatogenese aktiv sind (Ditton et al., 2004). Deletionen der „Azoospermia Factor a“ (AZFa) Region in Yq11, in der DDX3Y lokalisiert ist, führen zu einem Totalverlust der männlichen Keimzellen, dem „Sertoli Cell-only“ (SCO)-Syndrom. Beide Gene weisen in den kodierenden Sequenzen eine hohe Konserviertheit (92,4%) ihrer Aminosäure-Sequenzen auf, was auf eine funktionelle Selektion beider Genkopien hindeutet. Die Promoter- und 5´UTR-Sequenzen haben dagegen, seit Fehlen der Rekombination der Säuger Gonosomen, deutliche Chromosomen-spezifische Sequenzveränderungen durchlaufen. Diese Veränderungen haben zur Entstehung einer komplexen hodenspezifischen Transkriptions- und Translationskontrolle beider Gene geführt. Auf Grund der Allel-spezifischen Sequenzevolution sind unterschiedliche Core-Promotermodule zur Keimbahn-spezifischen Expressionskontrolle etabliert worden. Einige Sequenzmotive geben auch erste Hinweise auf unterschiedliche Chromatinstrukturen der beiden Promoterdomänen. Durch die Kombination von vergleichender Genomik in sechs Säugerspezies (Mensch, Schimpanse, Rhesusaffe, Weißbüschelaffe, Rind, Maus) für beide Gene und gezielten Experimenten, konnten sowohl Allel-spezifische, als auch Spezies-spezifische cis-regulative Module identifiziert werden. So konnten in den Human DDX3(X/Y) Promoterregionen neun konservierte Sequenzblöcke kartiert werden. Ein solcher Sequenzblock ist der Y-spezifische MSY2 Minisatellit (Bao et al., 2000). Eine MSY2, bzw. homologe MSY2-X Basissequenz konnte in allen Spezies für DDX3Y und DDX3X stromaufwärts zu den Transkriptionseinheiten identifiziert werden. Eine Vervielfältigung der MSY2 Sequenz erfolgte allerdings nur in Primaten und nur in der Keimzell-spezifischen Promoterdomäne von DDX3Y. In den neun Human X-Y Sequenzblöcken wurden 24 X-Y konservierte Transkriptionsfaktorbindestellen (TFBS) identifiziert. Besonders auffällig ist eine in allen untersuchten Spezies X-Y konservierte SOX5-TFBS, die in den MSY2 und MSY2-X Sequenzen lokalisiert ist. Insgesamt konnten 30 X-Y konservierte TFBS in den Human Promotersequenzen kartiert werden. Die Mehrzahl der dazugehörigen TFs weist eine Expression in Hodengewebe auf. Sechs gemeinsame TF-Module konnten identifiziert werden, wovon eines positionshomolog lokalisiert ist. Die konservierten TFBS und gemeinsamen TF-Module deuten somit auf einen gewissen Grad von Co-Regulation der beiden Gene in der männlichen Keimbahn hin. Daneben konnten zusätzlich mehrere Allel-spezifisch repetitive TFBS und TFBS-Cluster kartiert werden, die offensichtlich eine genspezifische Transkriptionskontrolle vermitteln. Die Kombination aus in-silico Analysen und gezielten Luziferase-Reportergenanalysen demonstriert somit eine erfolgreiche Strategie zur Identifizierung wesentlicher cis-Kontrollelemente für die DDX3(X/Y) Keimbahnexpression. Diese cis-Module sind daher gute Sequenzmotive für die molekulargenetische Mutationsanalyse bei infertilen Männern mit Verdacht auf Fehlfunktion der DDX3(X/Y) Expression

    Efficient Cache-oblivious String Algorithms for Bioinformatics ∗

    No full text
    We present theoretical and experimental results on cache-efficient and parallel algorithms for some well-studied string problems in bioinformatics: 1. Pairwise alignment. Optimal pairwise global sequence alignment using affine gap penalty; 2. Median. Optimal alignment of three sequences using affine gap penalty; 3. RNA secondary structure prediction. Maximizing number of base pairs in RNA secondary structure with simple pseudoknots. For each of these three problems we present cache-oblivious algorithms that match the best-known time complexity, match or improve the best-known space complexity, and improve significantly over the cache-efficiency of earlier algorithms. We also show that these algorithms are easily parallelizable, and we analyze their parallel performance. We present experimental results that show that all three cache-oblivious algorithms run faster on current desktop machines than the best previous algorithm for the problem. For the first two problems we compare our code to publicly available software written by others, and for the last problem our comparison is to our implementation of Akutsu’s algorithm. We also include empirical results showing good performance for the parallel implementations of our algorithms for the first two problems. Our methods are applicable to several other dynamic programs for string problems in bioinformatics including local alignment, generalized global alignment with intermittent similarities, multiple sequence alignment under several scoring functions such as ‘sum-of-pairs ’ objective function and RNA secondary structure prediction with simple pseudoknots using energy functions based on adjacent base pairs
    corecore