269 research outputs found

    Sorting signed circular permutations by super short reversals

    Get PDF
    We consider the problem of sorting a circular permutation by reversals of length at most 2, a problem that finds application in comparative genomics. Polynomial-time solutions for the unsigned version of this problem are known, but the signed version remained open. In this paper, we present the first polynomial-time solution for the signed version of this problem. Moreover, we perform an experiment for inferring distances and phylogenies for published Yersinia genomes and compare the results with the phylogenies presented in previous works.We consider the problem of sorting a circular permutation by reversals of length at most 2, a problem that finds application in comparative genomics. Polynomial-time solutions for the unsigned version of this problem are known, but the signed version rema9096272283FAPESP - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULOCAPES - COORDENAÇÃO DE APERFEIÇOAMENTO DE PESSOAL DE NÍVEL SUPERIORCNPQ - CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO2013/08293-72014/04718-6306730/2012-0; 477692/2012-5; 483370/2013-411th International Symposium on Bioinformatics Research and Application

    Sorting Circular Permutations by Super Short Reversals

    Get PDF
    International audienceWe consider the problem of sorting a circular permutation by super short reversals (i.e., reversals of length at most 2), aproblem that finds application in comparative genomics. Polynomial-time solutions to the unsigned version of this problem are known,but the signed version remained open. In this paper, we present the first polynomial-time solution to the signed version of this problem.Moreover, we perform experiments for inferring phylogenies of two different groups of bacterial species and compare our results withthe phylogenies presented in previous works. Finally, to facilitate phylogenetic studies based on the methods studied in this paper, wepresent a web tool for rearrangement-based phylogenetic inference using short operations, such as super short reversals

    Sorting Signed Circular Permutations by Super Short Reversals

    Get PDF
    International audienceWe consider the problem of sorting a circular permutation by reversals of length at most 2, a problem that finds application in comparative genomics. Polynomial-time solutions for the unsigned version of this problem are known, but the signed version remained open. In this paper, we present the first polynomial-time solution for the signed version of this problem. Moreover, we perform an experiment for inferring distances and phylogenies for published Yersinia genomes and compare the results with the phylogenies presented in previous works

    Sorting signed permutations by short operations

    Get PDF

    Sorting signed permutations by short operations

    Get PDF
    Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Background: During evolution, global mutations may alter the order and the orientation of the genes in a genome. Such mutations are referred to as rearrangement events, or simply operations. In unichromosomal genomes, the most common operations are reversals, which are responsible for reversing the order and orientation of a sequence of genes, and transpositions, which are responsible for switching the location of two contiguous portions of a genome. The problem of computing the minimum sequence of operations that transforms one genome into another - which is equivalent to the problem of sorting a permutation into the identity permutation - is a well-studied problem that finds application in comparative genomics. There are a number of works concerning this problem in the literature, but they generally do not take into account the length of the operations (i.e. the number of genes affected by the operations). Since it has been observed that short operations are prevalent in the evolution of some species, algorithms that efficiently solve this problem in the special case of short operations are of interest. Results: In this paper, we investigate the problem of sorting a signed permutation by short operations. More precisely, we study four flavors of this problem: (i) the problem of sorting a signed permutation by reversals of length at most 2; (ii) the problem of sorting a signed permutation by reversals of length at most 3; (iii) the problem of sorting a signed permutation by reversals and transpositions of length at most 2; and (iv) the problem of sorting a signed permutation by reversals and transpositions of length at most 3. We present polynomial-time solutions for problems (i) and (iii), a 5-approximation for problem (ii), and a 3-approximation for problem (iv). Moreover, we show that the expected approximation ratio of the 5-approximation algorithm is not greater than 3 for random signed permutations with more than 12 elements. Finally, we present experimental results that show that the approximation ratios of the approximation algorithms cannot be smaller than 3. In particular, this means that the approximation ratio of the 3-approximation algorithm is tight.During evolution, global mutations may alter the order and the orientation of the genes in a genome. Such mutations are referred to as rearrangement events, or simply operations. In unichromosomal genomes, the most common operations are reversals, which a10117CAPES - COORDENAÇÃO DE APERFEIÇOAMENTO DE PESSOAL DE NÍVEL SUPERIORFAPESP - FUNDAÇÃO DE AMPARO À PESQUISA DO ESTADO DE SÃO PAULOCNPQ - CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICOCoordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)FAPESP [2014/04718-6]CNPq [303947/2008-0, 477692/2012-5]CNPq [477692/2012-5, 306730/2012-0, 483370/2013-4]FAPESP [2013/08293-7]SEM INFORMAÇÃO2014/04718-6; 2013/08293-7303947/2008-0; 477692/2012-5; 306730/2012-0; 477692/2012-5; 483370/2013-

    An algebraic model for inversion and deletion in bacterial genome rearrangement

    Get PDF
    Reversals are a major contributor to variation among bacterial genomes, with studies suggesting that reversals involving small numbers of regions are more likely than larger reversals. Deletions may arise in bacterial genomes through the same biological mechanism as reversals, and hence a model that incorporates both is desirable. However, while reversal distances between genomes have been well studied, there has yet to be a model which accounts for the combination of deletions and short reversals. To account for both of these operations, we introduce an algebraic model that utilises partial permutations. This leads to an algorithm for calculating the minimum distance to the most recent common ancestor of two bacterial genomes evolving by short reversals and deletions. The algebraic model makes the existing short reversal models more complete and realistic by including deletions, and also introduces new algebraic tools into evolutionary distance problems.Comment: 19 pages, 10 figure

    Models and Algorithms for Sorting Permutations with Tandem Duplication and Random Loss

    Get PDF
    A central topic of evolutionary biology is the inference of phylogeny, i. e., the evolutionary history of species. A powerful tool for the inference of such phylogenetic relationships is the arrangement of the genes in mitochondrial genomes. The rationale is that these gene arrangements are subject to different types of mutations in the course of evolution. Hence, a high similarity in the gene arrangement between two species indicates a close evolutionary relation. Metazoan mitochondrial gene arrangements are particularly well suited for such phylogenetic studies as they are available for a wide range of species, their gene content is almost invariant, and usually free of duplicates. With these properties gene arrangements of mitochondrial genomes are modeled by permutations in which each element represents a gene, i. e., a specific genetic sequence. The mutations that shape the gene arrangement of genomes are then represented by operations that rearrange elements in permutations, so-called genome rearrangements, and thereby bridge the gap between evolutionary biology and optimization. Many problems of phylogeny inference can be formulated as challenging combinatorial optimization problems which makes this research area especially interesting for computer scientists. The most prominent examples of such optimization problems are the sorting problem and the distance problem. While the sorting problem requires a minimum length sequence of rearrangements that transforms one given permutation into another given permutation, i. e., it aims for a hypothetical scenario of gene order evolution, the distance problem intends to determine only the length of such a sequence. This minimum length is called distance and used as a (dis)similarity measure quantifying the evolutionary relatedness. Most evolutionary changes occurring in gene arrangements of mitochondrial genomes can be explained by the tandem duplication random loss (TDRL) genome rearrangement model. A TDRL consists of a duplication of a consecutive set of genes in tandem followed by a random loss of one copy of each duplicated gene. In spite of the importance of the TDRL genome rearrangement in mitochondrial evolution, its combinatorial properties have rarely been studied. In addition, models of genome rearrangements which include all types of rearrangement that are relevant for mitochondrial genomes, i. e., inversions, transpositions, inverse transpositions, and TDRLs, while admitting computational tractability are rare. Nevertheless, especially for metazoan gene arrangements the TDRL rearrangement should be considered for the reconstruction of phylogeny. Realizing that a better understanding of the TDRL model is indispensable for the study of mitochondrial gene arrangements, the central theme of this thesis is to broaden the horizon of TDRL genome rearrangements with respect to mitochondrial genome evolution. For this purpose, this thesis provides combinatorial properties of the TDRL model and its variants as well as efficient methods for a plausible reconstruction of rearrangement scenarios between gene arrangements. The methods that are proposed consider all types of genome rearrangements that predominately occur during mitochondrial evolution. More precisely, the main points contained in this thesis are as follows: The distance problem and the sorting problem for the TDRL model are further examined in respect to circular permutations, a formal concept that reflects the circular structure of mitochondrial genomes. As a result, a closed formula for the distance is provided. Recently, evidence for a variant of the TDRL rearrangement model in which the duplicated set of genes is additionally inverted have been found. Initiating the algorithmic study of this new rearrangement model on a certain type of permutations, a closed formula solving the distance problem is proposed as well as a quasilinear time algorithm that solves the corresponding sorting problem. The assumption that only one type of genome rearrangement has occurred during the evolution of certain gene arrangements is most likely unrealistic, e. g., at least three types of rearrangements on top of the TDRL rearrangement have to be considered for the evolution metazoan mitochondrial genomes. Therefore, three different biologically motivated constraints are taken into account in this thesis in order to produce plausible evolutionary rearrangement scenarios. The first constraint is extending the considered set of genome rearrangements to the model that covers all four common types of mitochondrial genome rearrangements. For this 4-type model a sharp lower bound and several close additive upper bounds on the distance are developed. As a byproduct, a polynomial-time approximation algorithm for the corresponding sorting problem is provided that guarantees the computation of pairwise rearrangement scenarios that deviate from a minimum length scenario by at most two rearrangement operations. The second biologically motivated constraint is the relative frequency of the different types of rearrangements occurring during the evolution. The frequency is modeled by employing a weighting scheme on the 4-type model in which every rearrangement is weighted with respect to its type. The resulting NP-hard sorting problem is then solved by means of a polynomial size integer linear program. The third biologically motivated constraint that has been taken into account is that certain subsets of genes are often found in close proximity in the gene arrangements of many different species. This observation is reflected by demanding rearrangement scenarios to preserve certain groups of genes which are modeled by common intervals of permutations. In order to solve the sorting problem that considers all three types of biologically motivated constraints, the exact dynamic programming algorithm CREx2 is proposed. CREx2 has a linear runtime for a large class of problem instances. Otherwise, two versions of the CREx2 are provided: The first version provides exact solutions but has an exponential runtime in the worst case and the second version provides approximated solutions efficiently. CREx2 is evaluated by an empirical study for simulated artificial and real biological mitochondrial gene arrangements

    Algorithmic approaches for genome rearrangement: a review

    Full text link

    Genome Rearrangement Problems

    Get PDF
    Various global rearrangements of permutations, such as reversals and transpositions, have recently become of interest because of their applications in computational molecular biology. A reversal is an operation that reverses the order of a substring of a permutation. A transposition is an operation that swaps two adjacent substrings of a permutation. The problem of determining the smallest number of reversals required to transform a given permutation into the identity permutation is called sorting by reversals. Similar problems can be defined for transpositions and other global rearrangements. Related to sorting by reversals is the problem of establishing the reversal diameter. The reversal diameter of Sn (the symmetric group on n elements) is the maximum number of reversals required to sort a permutation of length n. Of course, diameter problems can be posed for other global rearrangements. These various problems are of interest because the permutations can be used to represent sequences of genes in chromosomes, and the global rearrangements then represent evolutionary events. As a result, we call these problems genome rearrangement problems. Genome rearrangement problems seem to be unlike previously studied algorithmic problems on sequences, so new methods have had to be developed to deal with them. These methods predominantly employ graphs to model permutation structure. However, even using these methods, often a genome rearrangement problem has no obvious polynomial-time algorithm, and in some cases can be shown to be NP-hard. For example, the problem of sorting by reversals is NP-hard, whereas the computational complexity of sorting by transpositions is open. For problems like these, it is natural to seek polynomial-time approximation algorithms that achieve an approximation guarantee. In this thesis, we study several genome rearrangement problems as interesting and challenging algorithmic problems in their own right, including some problems for which the global rearrangement has no immediate biological equivalent. For example, we define a block-interchange to be a rearrangement that swaps any two substrings of the permutation. We examine, in particular, how the graph theoretic models relate to the genome rearrangement problems that we study. The major new results contained in this thesis are as follows: We present a 3/2-approximation algorithm for sorting by reversals. This is the best known approximation algorithm for the problem, and improves upon the 7/4 approximation bound of the previous best algorithm. We give a polynomial-time algorithm for a significant special case of sorting by reversals, thereby disproving a conjecture of Kececioglu and Sankoff, who had suggested that this special case was likely to be NP-hard. We analyse the structure of the so-called cpcle graph of a permutation in the context of sorting by transpositions, and thereby gain a deeper insight into this problem. Among the consequences are; a tighter lower bound for the problem, a simpler 3/2-aproximation algorithm than had previously been described, and algorithms that, in empirical tests, almost always find the exact transposition distance of random permutations. We introduce a natural generalisation of sorting by transpositions called sorting by block-interchanges, and present a polynomial-time algorithm for this problem. We initiate the study of analogous problems on strings over a fixed length alphabet. We establish upper and lower bounds and diameter results for the problems over a binary alphabet. We also prove that the problems analogous to sorting by reversals and sorting by block-interchanges are NP-hard. (Abstract shortened by ProQuest.)