13 research outputs found

    Inverse parametric sequence alignment

    Get PDF
    We consider the inverse parametric sequence alignment problem, where a sequence alignment is given and the task is to determine parameter values such that the given alignment is optimal at that parameter setting. We describe a O(mn logn)-time algorithm for inverse global alignment without gap penalties and a O(mn login) time algorithm for global alignment with gap penalties, where m, n (n[Less than or equal to symbol]m) are the lengths of input strings. Finally, we discuss the local alignment problem and future work

    Viterbi Sequences and Polytopes

    Full text link
    A Viterbi path of length n of a discrete Markov chain is a sequence of n+1 states that has the greatest probability of ocurring in the Markov chain. We divide the space of all Markov chains into Viterbi regions in which two Markov chains are in the same region if they have the same set of Viterbi paths. The Viterbi paths of regions of positive measure are called Viterbi sequences. Our main results are (1) each Viterbi sequence can be divided into a prefix, periodic interior, and suffix, and (2) as n increases to infinity (and the number of states remains fixed), the number of Viterbi regions remains bounded. The Viterbi regions correspond to the vertices of a Newton polytope of a polynomial whose terms are the probabilities of sequences of length n. We characterize Viterbi sequences and polytopes for two- and three-state Markov chains.Comment: 15 pages, 2 figures, to appear in Journal of Symbolic Computatio

    Parametric Inference for Biological Sequence Analysis

    Get PDF
    One of the major successes in computational biology has been the unification, using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied towards these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems associated with different statistical models. This paper introduces the \emph{polytope propagation algorithm} for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.Comment: 15 pages, 4 figures. See also companion paper "Tropical Geometry of Statistical Models" (q-bio.QM/0311009

    Parametric Alignment of Drosophila Genomes

    Get PDF
    The classic algorithms of Needleman--Wunsch and Smith--Waterman find a maximum a posteriori probability alignment for a pair hidden Markov model (PHMM). In order to process large genomes that have undergone complex genome rearrangements, almost all existing whole genome alignment methods apply fast heuristics to divide genomes into small pieces which are suitable for Needleman--Wunsch alignment. In these alignment methods, it is standard practice to fix the parameters and to produce a single alignment for subsequent analysis by biologists. Our main result is the construction of a whole genome parametric alignment of Drosophila melanogaster and Drosophila pseudoobscura. Parametric alignment resolves the issue of robustness to changes in parameters by finding all optimal alignments for all possible parameters in a PHMM. Our alignment draws on existing heuristics for dividing whole genomes into small pieces for alignment, and it relies on advances we have made in computing convex polytopes that allow us to parametrically align non-coding regions using biologically realistic models. We demonstrate the utility of our parametric alignment for biological inference by showing that cis-regulatory elements are more conserved between Drosophila melanogaster and Drosophila pseudoobscura than previously thought. We also show how whole genome parametric alignment can be used to quantitatively assess the dependence of branch length estimates on alignment parameters. The alignment polytopes, software, and supplementary material can be downloaded at http://bio.math.berkeley.edu/parametric/.Comment: 19 pages, 3 figure

    Parameters for accurate genome alignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome sequence alignments form the basis of much research. Genome alignment depends on various mundane but critical choices, such as how to mask repeats and which score parameters to use. Surprisingly, there has been no large-scale assessment of these choices using real genomic data. Moreover, rigorous procedures to control the rate of spurious alignment have not been employed.</p> <p>Results</p> <p>We have assessed 495 combinations of score parameters for alignment of animal, plant, and fungal genomes. As our gold-standard of accuracy, we used genome alignments implied by multiple alignments of proteins and of structural RNAs. We found the HOXD scoring schemes underlying alignments in the UCSC genome database to be far from optimal, and suggest better parameters. Higher values of the X-drop parameter are not always better. E-values accurately indicate the rate of spurious alignment, but only if tandem repeats are masked in a non-standard way. Finally, we show that γ-centroid (probabilistic) alignment can find highly reliable subsets of aligned bases.</p> <p>Conclusions</p> <p>These results enable more accurate genome alignment, with reliability measures for local alignments and for individual aligned bases. This study was made possible by our new software, LAST, which can align vertebrate genomes in a few hours <url>http://last.cbrc.jp/</url>.</p

    Simultaneous Identification of Duplications, Losses, and Lateral Gene Transfers

    Full text link

    Inverse Parametric Sequence Alignment

    No full text
    We consider the inverse parametric sequence alignment problem, where a sequence alignment is given and the task is to determine parameter values such that the given alignment is optimal at that parameter setting. We describe a O(rnn log m)-time algorithm for inverse global alignment without gaps and a O(rnn log 2 rn) time algorithm for global alignment with gaps, where rn, ( _ rn) are the lengths of input strings. We then discuss algorithms for local alignment

    Inverse parametric sequence alignment

    No full text
    We consider the inverse parametric sequence alignment problem, where a sequence alignment is given and the task is to determine parameter values such that the given alignment is optimal at that parameter setting. We describe a O(mn logn)-time algorithm for inverse global alignment without gap penalties and a O(mn login) time algorithm for global alignment with gap penalties, where m, n (n[Less than or equal to symbol]m) are the lengths of input strings. Finally, we discuss the local alignment problem and future work.</p
    corecore