703 research outputs found

    Polyhedral optimizations of RNA-RNA interaction computations

    Get PDF
    2017 Fall.Includes bibliographical references.Studying RNA-RNA interaction has led to major successes in the treatment of some cancers, including colon, breast and pancreatic cancer by suppressing the gene expression involved in the development of these diseases. The problem with such programs is that they are computationally and memory intensive: O(N4) space and O(N6) time complexity. Moreover, the entire application is complicated, and involves many mutually recursive data variables. We address the problem of speeding up a surrogate kernel (named OSPSQ) that captures the main dependence pattern found in two widely used RNA-RNA interaction applications IRIS and piRNA. The structure of the OSPSQ kernel perfectly fits the constraints of the polyhedral model, a well-developed technology for optimizing codes that belong to many specialized domains. However, the current state-of-the-art automatic polyhedral tools do not significantly improve the performance of the baseline implementation of OSPSQ. With simple techniques like loop permutation and skewing, we achieve an average of 17x sequential and 31x parallel speedup on a standard modern multi-core platform (Intel Broadwell, E5-1650v4). This performance represents 75% and 88% of attainable single-core and multi-core L1 bandwidth. For further performance improvement, we describe how to tile all six dimensions and also formulate the associated memory trade-off. In the future, we plan to implement these tiling strategies, explore the performance of the code for various tile sizes and optimize the whole piRNA application

    Detecting and comparing non-coding RNAs in the high-throughput era.

    Get PDF
    In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data

    Comparison and calibration of transcriptome data from RNA-Seq and tiling arrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tiling arrays have been the tool of choice for probing an organism's transcriptome without prior assumptions about the transcribed regions, but RNA-Seq is becoming a viable alternative as the costs of sequencing continue to decrease. Understanding the relative merits of these technologies will help researchers select the appropriate technology for their needs.</p> <p>Results</p> <p>Here, we compare these two platforms using a matched sample of poly(A)-enriched RNA isolated from the second larval stage of <it>C. elegans</it>. We find that the raw signals from these two technologies are reasonably well correlated but that RNA-Seq outperforms tiling arrays in several respects, notably in exon boundary detection and dynamic range of expression. By exploring the accuracy of sequencing as a function of depth of coverage, we found that about 4 million reads are required to match the sensitivity of two tiling array replicates. The effects of cross-hybridization were analyzed using a "nearest neighbor" classifier applied to array probes; we describe a method for determining potential "black list" regions whose signals are unreliable. Finally, we propose a strategy for using RNA-Seq data as a gold standard set to calibrate tiling array data. All tiling array and RNA-Seq data sets have been submitted to the modENCODE Data Coordinating Center.</p> <p>Conclusions</p> <p>Tiling arrays effectively detect transcript expression levels at a low cost for many species while RNA-Seq provides greater accuracy in several regards. Researchers will need to carefully select the technology appropriate to the biological investigations they are undertaking. It will also be important to reconsider a comparison such as ours as sequencing technologies continue to evolve.</p

    Genomic data mining for the computational prediction of small non-coding RNA genes

    Get PDF
    The objective of this research is to develop a novel computational prediction algorithm for non-coding RNA (ncRNA) genes using features computable for any genomic sequence without the need for comparative analysis. Existing comparative-based methods require the knowledge of closely related organisms in order to search for sequence and structural similarities. This approach imposes constraints on the type of ncRNAs, the organism, and the regions where the ncRNAs can be found. We have developed a novel approach for ncRNA gene prediction without the limitations of current comparative-based methods. Our work has established a ncRNA database required for subsequent feature and genomic analysis. Furthermore, we have identified significant features from folding-, structural-, and ensemble-based statistics for use in ncRNA prediction. We have also examined higher-order gene structures, namely operons, to discover potential insights into how ncRNAs are transcribed. Being able to automatically identify ncRNAs on a genome-wide scale is immensely powerful for incorporating it into a pipeline for large-scale genome annotation. This work will contribute to a more comprehensive annotation of ncRNA genes in microbial genomes to meet the demands of functional and regulatory genomic studies.Ph.D.Committee Chair: Dr. G. Tong Zhou; Committee Member: Dr. Arthur Koblasz; Committee Member: Dr. Eberhard Voit; Committee Member: Dr. Xiaoli Ma; Committee Member: Dr. Ying X

    The Relevance of Ribonuclease III in Pathogenic Bacteria

    Get PDF
    Dissertation presented to obtain the Ph.D degree in Biology.Ribonucleases (RNases) are key factors in the control of all biological processes, since they modulate the stability of RNA transcripts, allowing rapid changes in gene expression. Some RNases are up-regulated under stress situations and are involved in virulence processes in pathogenic microorganisms. RNases also control the levels of regulatory RNAs, which play very important roles in cell physiology.(...

    An information-bearing seed for nucleating algorithmic self-assembly

    Get PDF
    Self-assembly creates natural mineral, chemical, and biological structures of great complexity. Often, the same starting materials have the potential to form an infinite variety of distinct structures; information in a seed molecule can determine which form is grown as well as where and when. These phenomena can be exploited to program the growth of complex supramolecular structures, as demonstrated by the algorithmic self-assembly of DNA tiles. However, the lack of effective seeds has limited the reliability and yield of algorithmic crystals. Here, we present a programmable DNA origami seed that can display up to 32 distinct binding sites and demonstrate the use of seeds to nucleate three types of algorithmic crystals. In the simplest case, the starting materials are a set of tiles that can form crystalline ribbons of any width; the seed directs assembly of a chosen width with >90% yield. Increased structural diversity is obtained by using tiles that copy a binary string from layer to layer; the seed specifies the initial string and triggers growth under near-optimal conditions where the bit copying error rate is 17 kb of sequence information. In sum, this work demonstrates how DNA origami seeds enable the easy, high-yield, low-error-rate growth of algorithmic crystals as a route toward programmable bottom-up fabrication

    Accelerating the BPMax algorithm for RNA-RNA interaction

    Get PDF
    2021 Summer.Includes bibliographical references.RNA-RNA interactions (RRIs) are essential in many biological processes, including gene tran- scription, translation, and localization. They play a critical role in diseases such as cancer and Alzheimer's. An RNA-RNA interaction algorithm uses a dynamic programming algorithm to predict the secondary structure and suffers very high computational time. Its high complexity (Θ(N3M3) in time and Θ(N2M2) in space) makes it both essential and a challenge to parallelize. RRI programs are developed and optimized by hand most of the time, which is prone to human error and costly to develop and maintain. This thesis presents the parallelization of an RRI program - BPMax on a single shared memory CPU platform. From a mathematical specification of the dynamic programming algorithm, we generate highly optimized code that achieves over 100× speedup over the baseline program that uses a standard 'diagonal-by-diagonal' execution order. We achieve 100 GFLOPS, which is about a fourth of our platform's peak theoretical single-precision performance for max-plus computation. The main kernel in the algorithm, whose complexity is Θ(N3M3) attains 186 GFLOPS. We do this with a polyhedral code generation tool, A L P H A Z, which takes user-specified mapping directives and automatically generates optimized C code that enhances parallelism and locality. A L P H A Z allows the user to explore various schedules, memory maps, parallelization approaches, and tiling of the most dominant part of the computation

    Lock-free Parallel Dynamic Programming

    Get PDF
    We show a method for parallelizing top down dynamic programs in a straightforward way by a careful choice of a lock-free shared hash table implementation and randomization of the order in which the dynamic program computes its subproblems. This generic approach is applied to dynamic programs for knapsack, shortest paths, and RNA structure alignment, as well as to a state-of-the-art solution for minimizing the máximum number of open stacks. Experimental results are provided on three different modern multicore architectures which show that this parallelization is effective and reasonably scalable. In particular, we obtain over 10 times speedup for 32 threads on the open stacks problem

    Prediction of novel microRNA genes in cancer-associated genomic regions—a combined computational and experimental approach

    Get PDF
    The majority of existing computational tools rely on sequence homology and/or structural similarity to identify novel microRNA (miRNA) genes. Recently supervised algorithms are utilized to address this problem, taking into account sequence, structure and comparative genomics information. In most of these studies miRNA gene predictions are rarely supported by experimental evidence and prediction accuracy remains uncertain. In this work we present a new computational tool (SSCprofiler) utilizing a probabilistic method based on Profile Hidden Markov Models to predict novel miRNA precursors. Via the simultaneous integration of biological features such as sequence, structure and conservation, SSCprofiler achieves a performance accuracy of 88.95% sensitivity and 84.16% specificity on a large set of human miRNA genes. The trained classifier is used to identify novel miRNA gene candidates located within cancer-associated genomic regions and rank the resulting predictions using expression information from a full genome tiling array. Finally, four of the top scoring predictions are verified experimentally using northern blot analysis. Our work combines both analytical and experimental techniques to show that SSCprofiler is a highly accurate tool which can be used to identify novel miRNA gene candidates in the human genome. SSCprofiler is freely available as a web service at http://www.imbb.forth.gr/SSCprofiler.html
    • …
    corecore