5 research outputs found

    Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>While most multiple sequence alignment programs expect that all or most of their input is known to be homologous, and penalise insertions and deletions, this is not a reasonable assumption for non-coding DNA, which is much less strongly conserved than protein-coding genes. Arguing that the goal of sequence alignment should be the detection of <it>homology </it>and not <it>similarity</it>, we incorporate an evolutionary model into a previously published multiple sequence alignment program for non-coding DNA, Sigma, as a sensitive likelihood-based way to assess the significance of alignments. Version 1 of Sigma was successful in eliminating spurious alignments but exhibited relatively poor sensitivity on synthetic data. Sigma 1 used a <it>p</it>-value (the probability under the "null hypothesis" of non-homology) to assess the significance of alignments, and, optionally, a background model that captured short-range genomic correlations. Sigma version 2, described here, retains these features, but calculates the <it>p</it>-value using a sophisticated evolutionary model that we describe here, and also allows for a transition matrix for different substitution rates from and to different nucleotides. Our evolutionary model takes separate account of mutation and fixation, and can be extended to allow for locally differing functional constraints on sequence.</p> <p>Results</p> <p>We demonstrate that, on real and synthetic data, Sigma-2 significantly outperforms other programs in specificity to genuine homology (that is, it minimises alignment of spuriously similar regions that do not have a common ancestry) while it is now as sensitive as the best current programs.</p> <p>Conclusions</p> <p>Comparing these results with an extrapolation of the best results from other available programs, we suggest that conservation rates in intergenic DNA are often significantly over-estimated. It is increasingly important to align non-coding DNA correctly, in regulatory genomics and in the context of whole-genome alignment, and Sigma-2 is an important step in that direction.</p

    A polyhedral approach to sequence alignment problems

    Get PDF
    We study two problems in sequence alignment both from a theoretical and a practical point of view. For the first time in sequence alignment, we use tools from combinatorial optimization to develop branch-and-cut algorithms that solve these problems efficiently. The Generalized Maximum Trace formulation captures several forms of multiple sequence alignment problems in a common framework, among them is the original formulation of Maximum Trace. The Structural Maximum Trace Problem captures the comparison of RNA molecules on the basis of their primary sequence and their secondary structure. For both problems we derive a characterization in terms of graphs which we use to reformulate the problems in terms of integer linear programs. We then study the polytopes (or convex hulls of all feasible solutions)associated with the integer linear program for both problems. For each polytope we derive several classes of facet-defining inequalities and show that for some of these classes the corresponding separation problem can be solved in polynomial time. Thisleads to a polynomial time algorithm for pairwise sequence alignment that is not based on dynamic programming. Moreover, for multiple sequences the branch-and-cut algorithms for both sequence alignment problems are able to solve to optimality instances that are beyond the range of present dynamic programming approaches.Wir betrachten zwei Sequenz-Alignment-Probleme von einem theoretischen und praktischen Standpunkt aus. Dabei nutzen wir Methoden der kombinatorischen Optimierung, um Branch-and-Cut-Algorithmen zu entwickeln, die diese Probleme effizient lösen. Das sogenannte Generalized-Maximum-Trace-Problem beinhaltet verschiedene Arten von multiplen Sequenz-Alignment in einem einheitlichen Rahmen, darunter auch das ursprüngliche Maximum-Trace-Problem. Das sogenannte Structural-Maximum- Trace-Problem beschreibt den Vergleich von RNA-Molekülen, basierend auf deren Primär- und Sekundärstruktur. Wir leiten für beide Probleme eine graphentheoretische Formulierung her, welche wir dann zur Definition ganzzahliger linearer Programme benutzen. Wir untersuchen die Polytope (d.h. die konvexen Hüllen aller zulässigen Lösungen), die mit den ganzzahligen, linearen Programmen assoziiert sind. Für jedes Polytop leiten wir mehrere Klassen facettendefinierender Ungleichungen her und zeigen, daß für einige dieser Klassen das entsprechende Separationsproblem in Polynomialzeit gelöst werden kann. Dies impliziert unter anderem einen Polynomialzeitalgorithmus zum paarweisen Sequenzvergleich, welcher nicht auf dem Prinzip der dynamischen Programmierung beruht. Darüber hinaus sind die vorgestellten Branch-and- Cut-Algorithmen in der Lage, Probleminstanzen einer Größe optimal zu lösen, die mit Verfahren, welche auf dynamischer Programmierung beruhen, nicht gelöst werden könne

    A polyhedral approach to sequence alignment problems

    Get PDF
    We study two problems in sequence alignment both from a theoretical and a practical point of view. For the first time in sequence alignment, we use tools from combinatorial optimization to develop branch-and-cut algorithms that solve these problems efficiently. The Generalized Maximum Trace formulation captures several forms of multiple sequence alignment problems in a common framework, among them is the original formulation of Maximum Trace. The Structural Maximum Trace Problem captures the comparison of RNA molecules on the basis of their primary sequence and their secondary structure. For both problems we derive a characterization in terms of graphs which we use to reformulate the problems in terms of integer linear programs. We then study the polytopes (or convex hulls of all feasible solutions)associated with the integer linear program for both problems. For each polytope we derive several classes of facet-defining inequalities and show that for some of these classes the corresponding separation problem can be solved in polynomial time. Thisleads to a polynomial time algorithm for pairwise sequence alignment that is not based on dynamic programming. Moreover, for multiple sequences the branch-and-cut algorithms for both sequence alignment problems are able to solve to optimality instances that are beyond the range of present dynamic programming approaches.Wir betrachten zwei Sequenz-Alignment-Probleme von einem theoretischen und praktischen Standpunkt aus. Dabei nutzen wir Methoden der kombinatorischen Optimierung, um Branch-and-Cut-Algorithmen zu entwickeln, die diese Probleme effizient lösen. Das sogenannte Generalized-Maximum-Trace-Problem beinhaltet verschiedene Arten von multiplen Sequenz-Alignment in einem einheitlichen Rahmen, darunter auch das ursprüngliche Maximum-Trace-Problem. Das sogenannte Structural-Maximum- Trace-Problem beschreibt den Vergleich von RNA-Molekülen, basierend auf deren Primär- und Sekundärstruktur. Wir leiten für beide Probleme eine graphentheoretische Formulierung her, welche wir dann zur Definition ganzzahliger linearer Programme benutzen. Wir untersuchen die Polytope (d.h. die konvexen Hüllen aller zulässigen Lösungen), die mit den ganzzahligen, linearen Programmen assoziiert sind. Für jedes Polytop leiten wir mehrere Klassen facettendefinierender Ungleichungen her und zeigen, daß für einige dieser Klassen das entsprechende Separationsproblem in Polynomialzeit gelöst werden kann. Dies impliziert unter anderem einen Polynomialzeitalgorithmus zum paarweisen Sequenzvergleich, welcher nicht auf dem Prinzip der dynamischen Programmierung beruht. Darüber hinaus sind die vorgestellten Branch-and- Cut-Algorithmen in der Lage, Probleminstanzen einer Größe optimal zu lösen, die mit Verfahren, welche auf dynamischer Programmierung beruhen, nicht gelöst werden könne

    Sequence Comparison in Historical Linguistics

    Get PDF
    B

    Sequence Comparison in Historical Linguistics

    Get PDF
    corecore