2,387 research outputs found

    Towards high performance computing for molecular structure prediction using IBM Cell Broadband Engine - an implementation perspective

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA structure prediction problem is a computationally complex task, especially with pseudo-knots. The problem is well-studied in existing literature and predominantly uses highly coupled Dynamic Programming (DP) solutions. The problem scale and complexity become embarrassingly humungous to handle as sequence size increases. This makes the case for parallelization. Parallelization can be achieved by way of networked platforms (clusters, grids, etc) as well as using modern day multi-core chips.</p> <p>Methods</p> <p>In this paper, we exploit the parallelism capabilities of the IBM Cell Broadband Engine to parallelize an existing Dynamic Programming (DP) algorithm for RNA secondary structure prediction. We design three different implementation strategies that exploit the inherent data, code and/or hybrid parallelism, referred to as C-Par, D-Par and H-Par, and analyze their performances. Our approach attempts to introduce parallelism in critical sections of the algorithm. We ran our experiments on SONY Play Station 3 (PS3), which is based on the IBM Cell chip.</p> <p>Results</p> <p>Our results suggest that introducing parallelism in DP algorithm allows it to easily handle longer sequences which otherwise would consume a large amount of time in single core computers. The results further demonstrate the speed-up gain achieved in exploiting the inherent parallelism in the problem and also elicits the advantages of using multi-core platforms towards designing more sophisticated methodologies for handling a fairly long sequence of RNA.</p> <p>Conclusion</p> <p>The speed-up performance reported here is promising, especially when sequence length is long. To the best of our literature survey, the work reported in this paper is probably the first-of-its-kind to utilize the IBM Cell Broadband Engine (a heterogeneous multi-core chip) to implement a DP. The results also encourage using multi-core platforms towards designing more sophisticated methodologies for handling a fairly long sequence of RNA to predict its secondary structure.</p

    Towards high performance computing for molecular structure prediction using IBM Cell Broadband Engine - an implementation perspective

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA structure prediction problem is a computationally complex task, especially with pseudo-knots. The problem is well-studied in existing literature and predominantly uses highly coupled Dynamic Programming (DP) solutions. The problem scale and complexity become embarrassingly humungous to handle as sequence size increases. This makes the case for parallelization. Parallelization can be achieved by way of networked platforms (clusters, grids, etc) as well as using modern day multi-core chips.</p> <p>Methods</p> <p>In this paper, we exploit the parallelism capabilities of the IBM Cell Broadband Engine to parallelize an existing Dynamic Programming (DP) algorithm for RNA secondary structure prediction. We design three different implementation strategies that exploit the inherent data, code and/or hybrid parallelism, referred to as C-Par, D-Par and H-Par, and analyze their performances. Our approach attempts to introduce parallelism in critical sections of the algorithm. We ran our experiments on SONY Play Station 3 (PS3), which is based on the IBM Cell chip.</p> <p>Results</p> <p>Our results suggest that introducing parallelism in DP algorithm allows it to easily handle longer sequences which otherwise would consume a large amount of time in single core computers. The results further demonstrate the speed-up gain achieved in exploiting the inherent parallelism in the problem and also elicits the advantages of using multi-core platforms towards designing more sophisticated methodologies for handling a fairly long sequence of RNA.</p> <p>Conclusion</p> <p>The speed-up performance reported here is promising, especially when sequence length is long. To the best of our literature survey, the work reported in this paper is probably the first-of-its-kind to utilize the IBM Cell Broadband Engine (a heterogeneous multi-core chip) to implement a DP. The results also encourage using multi-core platforms towards designing more sophisticated methodologies for handling a fairly long sequence of RNA to predict its secondary structure.</p

    Polyhedral optimizations of RNA-RNA interaction computations

    Get PDF
    2017 Fall.Includes bibliographical references.Studying RNA-RNA interaction has led to major successes in the treatment of some cancers, including colon, breast and pancreatic cancer by suppressing the gene expression involved in the development of these diseases. The problem with such programs is that they are computationally and memory intensive: O(N4) space and O(N6) time complexity. Moreover, the entire application is complicated, and involves many mutually recursive data variables. We address the problem of speeding up a surrogate kernel (named OSPSQ) that captures the main dependence pattern found in two widely used RNA-RNA interaction applications IRIS and piRNA. The structure of the OSPSQ kernel perfectly fits the constraints of the polyhedral model, a well-developed technology for optimizing codes that belong to many specialized domains. However, the current state-of-the-art automatic polyhedral tools do not significantly improve the performance of the baseline implementation of OSPSQ. With simple techniques like loop permutation and skewing, we achieve an average of 17x sequential and 31x parallel speedup on a standard modern multi-core platform (Intel Broadwell, E5-1650v4). This performance represents 75% and 88% of attainable single-core and multi-core L1 bandwidth. For further performance improvement, we describe how to tile all six dimensions and also formulate the associated memory trade-off. In the future, we plan to implement these tiling strategies, explore the performance of the code for various tile sizes and optimize the whole piRNA application

    A multiple layer model to compare RNA secondary structures

    Get PDF
    International audienceWe formally introduce a new data structure, called MiGaL for ``Multiple Graph Layers'', that is composed of various graphs linked together by relations of abstraction/refinement. The new structure is useful for representing information that can be described at different levels of abstraction, each level corresponding to a graph. We then propose an algorithm for comparing two MiGaLs. The algorithm performs a step-by-step comparison starting with the most ``abstract'' level. The result of the comparison at a given step is communicated to the next step using a special colouring scheme. MiGaLs represent a very natural model for comparing RNA secondary structures that may be seen at different levels of detail, going from the sequence of nucleotides, single or paired with another to participate in a helix, to the network of multiple loops that is believed to represent the most conserved part of RNAs having similar function. We therefore show how to use MiGaLs to very efficiently compare two RNAs of any size at different levels of detail

    Elastic properties of proteins: insight on the folding process and evolutionary selection of native structures

    Full text link
    We carry out a theoretical study of the vibrational and relaxation properties of naturally-occurring proteins with the purpose of characterizing both the folding and equilibrium thermodynamics. By means of a suitable model we provide a full characterization of the spectrum and eigenmodes of vibration at various temperatures by merely exploiting the knowledge of the protein native structure. It is shown that the rate at which perturbations decay at the folding transition correlates well with experimental folding rates. This validation is carried out on a list of about 30 two-state folders. Furthermore, the qualitative analysis of residues mean square displacements (shown to accurately reproduce crystallographic data) provides a reliable and statistically accurate method to identify crucial folding sites/contacts. This novel strategy is validated against clinical data for HIV-1 Protease. Finally, we compare the spectra and eigenmodes of vibration of natural proteins against randomly-generated compact structures and regular random graphs. The comparison reveals a distinctive enhanced flexibility of natural structures accompanied by slow relaxation times at the folding temperature. The fact that these properties are intimately connected to the presence and assembly of secondary motifs hints at the special criteria adopted by evolution in the selection of viable folds.Comment: Revtex 17 pages, 13 eps figure

    Parallelization of dynamic programming recurrences in computational biology

    Get PDF
    The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

    New Advances in NGS Technologies

    Get PDF
    In the next-generation sequencing (NGS) methods, a DNA molecule of an individual is broken down into many small fragments to make up the so-called sequencing library. These small fragments serve as a template for the synthesis of numerous complementary fragments (called reads). Every small piece of the original DNA is copied many times in a variable number of reads. Depending on the desired accuracy level, it is possible to set the system to achieve a certain level of coverage, i.e., a number of reads per fragment. A level of 30X coverage is already sufficient for the routine diagnosis of most of the Mendelian diseases. All the sequences are then transferred into a computer and aligned with a reference sequence available in the international databases. By this way, all sequences of reads can be recomposed as a fine puzzle to obtain the sequence of a single gene or whole genome. The NGS machines, available today, are very flexible devices. In fact, an NGS sequencer can be used for different types of applications: (1) whole-genome sequencing (WGS): analysis of the entire genome of an individual; (2) whole exome sequencing (WES): analysis of the entire coding genes of an individual; (3) targeted sequencing: analysis of a set of genes or a single gene; (4) transcriptome analysis: analysis of all the RNA produced by specific cells

    Looking at the nudibranch family myrrhinidae (Gastropoda, heterobranchia) from a mitochondrial ‘2d folding structure’ point of view

    Get PDF
    Integrative taxonomy is an evolving field of multidisciplinary studies often utilised to elucidate phylogenetic reconstructions that were poorly understood in the past. The systematics of many taxa have been resolved by combining data from different research approaches, i.e., molecular, ecological, behavioural, morphological and chemical. Regarding molecular analysis, there is currently a search for new genetic markers that could be diagnostic at different taxonomic levels and that can be added to the canonical ones. In marine Heterobranchia, the most widely used mitochondrial markers, COI and 16S, are usually analysed by comparing the primary sequence. The 16S rRNA molecule can be folded into a 2D secondary structure that has been poorly exploited in the past study of heterobranchs, despite 2D molecular analyses being sources of possible diagnostic characters. Comparison of the results from the phylogenetic analyses of a concatenated (the nuclear H3 and the mitochondrial COI and 16S markers) dataset (including 30 species belonging to eight accepted genera) and from the 2D folding structure analyses of the 16S rRNA from the type species of the genera investigated demonstrated the diagnostic power of this RNA molecule to reveal the systematics of four genera belonging to the family Myrrhinidae (Gastropoda, Heterobranchia). The “molecular morphological” approach to the 16S rRNA revealed to be a powerful tool to delimit at both species and genus taxonomic levels and to be a useful way of recovering information that is usually lost in phylogenetic analyses. While the validity of the genera Godiva, Hermissenda and Phyllodesmium are confirmed, a new genus is necessary and introduced for Dondice banyulensis, Nemesis gen. nov. and the monospecific genus Nanuca is here synonymised with Dondice, with Nanuca sebastiani transferred into Dondice as Dondice sebastiani comb. nov
    corecore