1,185 research outputs found

    Geometric combinatorics and computational molecular biology: branching polytopes for RNA sequences

    Full text link
    Questions in computational molecular biology generate various discrete optimization problems, such as DNA sequence alignment and RNA secondary structure prediction. However, the optimal solutions are fundamentally dependent on the parameters used in the objective functions. The goal of a parametric analysis is to elucidate such dependencies, especially as they pertain to the accuracy and robustness of the optimal solutions. Techniques from geometric combinatorics, including polytopes and their normal fans, have been used previously to give parametric analyses of simple models for DNA sequence alignment and RNA branching configurations. Here, we present a new computational framework, and proof-of-principle results, which give the first complete parametric analysis of the branching portion of the nearest neighbor thermodynamic model for secondary structure prediction for real RNA sequences.Comment: 17 pages, 8 figure

    Louse (Insecta : Phthiraptera) mitochondrial 12S rRNA secondary structure is highly variable

    Get PDF
    Lice are ectoparasitic insects hosted by birds and mammals. Mitochondrial 12S rRNA sequences obtained from lice show considerable length variation and are very difficult to align. We show that the louse 12S rRNA domain III secondary structure displays considerable variation compared to other insects, in both the shape and number of stems and loops. Phylogenetic trees constructed from tree edit distances between louse 12S rRNA structures do not closely resemble trees constructed from sequence data, suggesting that at least some of this structural variation has arisen independently in different louse lineages. Taken together with previous work on mitochondrial gene order and elevated rates of substitution in louse mitochondrial sequences, the structural variation in louse 12S rRNA confirms the highly distinctive nature of molecular evolution in these insects

    Parallelization of dynamic programming recurrences in computational biology

    Get PDF
    The rapid growth of biosequence databases over the last decade has led to a performance bottleneck in the applications analyzing them. In particular, over the last five years DNA sequencing capacity of next-generation sequencers has been doubling every six months as costs have plummeted. The data produced by these sequencers is overwhelming traditional compute systems. We believe that in the future compute performance, not sequencing, will become the bottleneck in advancing genome science. In this work, we investigate novel computing platforms to accelerate dynamic programming algorithms, which are popular in bioinformatics workloads. We study algorithm-specific hardware architectures that exploit fine-grained parallelism in dynamic programming kernels using field-programmable gate arrays: FPGAs). We advocate a high-level synthesis approach, using the recurrence equation abstraction to represent dynamic programming and polyhedral analysis to exploit parallelism. We suggest a novel technique within the polyhedral model to optimize for throughput by pipelining independent computations on an array. This design technique improves on the state of the art, which builds latency-optimal arrays. We also suggest a method to dynamically switch between a family of designs using FPGA reconfiguration to achieve a significant performance boost. We have used polyhedral methods to parallelize the Nussinov RNA folding algorithm to build a family of accelerators that can trade resources for parallelism and are between 15-130x faster than a modern dual core CPU implementation. A Zuker RNA folding accelerator we built on a single workstation with four Xilinx Virtex 4 FPGAs outperforms 198 3 GHz Intel Core 2 Duo processors. Furthermore, our design running on a single FPGA is an order of magnitude faster than competing implementations on similar-generation FPGAs and graphics processors. Our work is a step toward the goal of automated synthesis of hardware accelerators for dynamic programming algorithms

    The Mathematics of Phylogenomics

    Get PDF
    The grand challenges in biology today are being shaped by powerful high-throughput technologies that have revealed the genomes of many organisms, global expression patterns of genes and detailed information about variation within populations. We are therefore able to ask, for the first time, fundamental questions about the evolution of genomes, the structure of genes and their regulation, and the connections between genotypes and phenotypes of individuals. The answers to these questions are all predicated on progress in a variety of computational, statistical, and mathematical fields. The rapid growth in the characterization of genomes has led to the advancement of a new discipline called Phylogenomics. This discipline results from the combination of two major fields in the life sciences: Genomics, i.e., the study of the function and structure of genes and genomes; and Molecular Phylogenetics, i.e., the study of the hierarchical evolutionary relationships among organisms and their genomes. The objective of this article is to offer mathematicians a first introduction to this emerging field, and to discuss specific mathematical problems and developments arising from phylogenomics.Comment: 41 pages, 4 figure

    An exact mathematical programming approach to multiple RNA sequence-structure alignment

    Get PDF
    One of the main tasks in computational biology is the computation of alignments of genomic sequences to reveal their commonalities. In case of DNA or protein sequences, sequence information alone is usually sufficient to compute reliable alignments. RNA molecules, however, build spatial conformations—the secondary structure—that are more conserved than the actual sequence. Hence, computing reliable alignments of RNA molecules has to take into account the secondary structure. We present a novel framework for the computation of exact multiple sequence-structure alignments: We give a graph- theoretic representation of the sequence-structure alignment problem and phrase it as an integer linear program. We identify a class of constraints that make the problem easier to solve and relax the original integer linear program in a Lagrangian manner. Experiments on a recently published benchmark show that our algorithms has a comparable performance than more costly dynamic programming algorithms, and outperforms all other approaches in terms of solution quality with an increasing number of input sequences

    Algorithm engineering for optimal alignment of protein structure distance matrices

    Get PDF
    Protein structural alignment is an important problem in computational biology. In this paper, we present first successes on provably optimal pairwise alignment of protein inter-residue distance matrices, using the popular Dali scoring function. We introduce the structural alignment problem formally, which enables us to express a variety of scoring functions used in previous work as special cases in a unified framework. Further, we propose the first mathematical model for computing optimal structural alignments based on dense inter-residue distance matrices. We therefore reformulate the problem as a special graph problem and give a tight integer linear programming model. We then present algorithm engineering techniques to handle the huge integer linear programs of real-life distance matrix alignment problems. Applying these techniques, we can compute provably optimal Dali alignments for the very first time

    Asymmetric Genome Organization in an RNA Virus Revealed via Graph-Theoretical Analysis of Tomographic Data

    Get PDF
    Cryo-electron microscopy permits 3-D structures of viral pathogens to be determined in remarkable detail. In particular, the protein containers encapsulating viral genomes have been determined to high resolution using symmetry averaging techniques that exploit the icosahedral architecture seen in many viruses. By contrast, structure determination of asymmetric components remains a challenge, and novel analysis methods are required to reveal such features and characterize their functional roles during infection. Motivated by the important, cooperative roles of viral genomes in the assembly of single-stranded RNA viruses, we have developed a new analysis method that reveals the asymmetric structural organization of viral genomes in proximity to the capsid in such viruses. The method uses geometric constraints on genome organization, formulated based on knowledge of icosahedrally-averaged reconstructions and the roles of the RNA-capsid protein contacts, to analyse cryo-electron tomographic data. We apply this method to the low-resolution tomographic data of a model virus and infer the unique asymmetric organization of its genome in contact with the protein shell of the capsid. This opens unprecedented opportunities to analyse viral genomes, revealing conserved structural features and mechanisms that can be targeted in antiviral drug desig
    • 

    corecore