412 research outputs found

    A fast algorithm for the multiple genome rearrangement problem with weighted reversals and transpositions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Due to recent progress in genome sequencing, more and more data for phylogenetic reconstruction based on rearrangement distances between genomes become available. However, this phylogenetic reconstruction is a very challenging task. For the most simple distance measures (the breakpoint distance and the reversal distance), the problem is NP-hard even if one considers only three genomes.</p> <p>Results</p> <p>In this paper, we present a new heuristic algorithm that directly constructs a phylogenetic tree w.r.t. the weighted reversal and transposition distance. Experimental results on previously published datasets show that constructing phylogenetic trees in this way results in better trees than constructing the trees w.r.t. the reversal distance, and recalculating the weight of the trees with the weighted reversal and transposition distance. An implementation of the algorithm can be obtained from the authors.</p> <p>Conclusion</p> <p>The possibility of creating phylogenetic trees directly w.r.t. the weighted reversal and transposition distance results in biologically more realistic scenarios. Our algorithm can solve today's most challenging biological datasets in a reasonable amount of time.</p

    Phylogenetic reconstruction from transpositions

    Get PDF
    Background Because of the advent of high-throughput sequencing and the consequent reduction in the cost of sequencing, many organisms have been completely sequenced and most of their genes identified. It thus has become possible to represent whole genomes as ordered lists of gene identifiers and to study the rearrangement of these entities through computational means. As a result, genome rearrangement data has attracted increasing attentions from both biologists and computer scientists as a new type of data for phylogenetic analysis. The main events of genome rearrangements include inversions, transpositions and transversions. To date, GRAPPA and MGR are the most accurate methods for rearrangement phylogeny, both assuming inversion as the only event. However, due to the complexity of computing transposition distance, it is very difficult to analyze datasets when transpositions are dominant. Results We extend GRAPPA to handle transpositions. The new method is named GRAPPA-TP, with two major extensions: a heuristic method to estimate transposition distance, and a new transposition median solver for three genomes. Although GRAPPA-TP uses a greedy approach to compute the transposition distance, it is very accurate when genomes are relatively close. The new GRAPPA-TP is available from http://phylo.cse.sc.edu/ Conclusion Our extensive testing using simulated datasets shows that GRAPPA-TP is very accurate in terms of ancestor genome inference and phylogenetic reconstruction. Simulation results also suggest that model match is critical in genome rearrangement analysis: it is not accurate to simulate transpositions with other events including inversions

    BeamGA Median: A Hybrid Heuristic Search Approach

    Get PDF
    The median problem is significantly applied to derive the most reasonable rearrangement phylogenetic tree for many species. More specifically, the problem is concerned with finding a permutation that minimizes the sum of distances between itself and a set of three signed permutations. Genomes with equal number of genes but different order can be represented as permutations. In this paper, an algorithm, namely BeamGA median, is proposed that combines a heuristic search approach (local beam) as an initialization step to generate a number of solutions, and then a Genetic Algorithm (GA) is applied in order to refine the solutions, aiming to achieve a better median with the smallest possible reversal distance from the three original permutations. In this approach, any genome rearrangement distance can be applied. In this paper, we use the reversal distance. To the best of our knowledge, the proposed approach was not applied before for solving the median problem. Our approach considers true biological evolution scenario by applying the concept of common intervals during the GA optimization process. This allows us to imitate a true biological behavior and enhance genetic approach time convergence. We were able to handle permutations with a large number of genes, within an acceptable time performance and with same or better accuracy as compared to existing algorithms

    Gene order rearrangement methods for the reconstruction of phylogeny

    Get PDF
    The study of phylogeny, i.e. the evolutionary history of species, is a central problem in biology and a key for understanding characteristics of contemporary species. Many problems in this area can be formulated as combinatorial optimisation problems which makes it particularly interesting for computer scientists. The reconstruction of the phylogeny of species can be based on various kinds of data, e.g. morphological properties or characteristics of the genetic information of the species. Maximum parsimony is a popular and widely used method for phylogenetic reconstruction aiming for an explanation of the observed data requiring the least evolutionary changes. A certain property of the genetic information gained much interest for the reconstruction of phylogeny in recent time: the organisation of the genomes of species, i.e. the arrangement of the genes on the chromosomes. But the idea to reconstruct phylogenetic information from gene arrangements has a long history. In Dobzhansky and Sturtevant (1938) it was already pointed out that “a comparison of the different gene arrangements in the same chromosome may, in certain cases, throw light on the historical relationships of these structures, and consequently on the history of the species as a whole”. This kind of data is promising for the study of deep evolutionary relationships because gene arrangements are believed to evolve slowly (Rokas and Holland, 2000). This seems to be the case especially for mitochondrial genomes which are available for a wide range of species (Boore, 1999). The development of methods for the reconstruction of phylogeny from gene arrangement data has made considerable progress during the last years. Prominent examples are the computation of parsimonious evolutionary scenarios, i.e. a shortest sequence of rearrangements transforming one arrangement of genes into another or the length of such a minimal scenario (Hannenhalli and Pevzner, 1995b; Sankoff, 1992; Watterson et al., 1982); the reconstruction of parsimonious phylogenetic trees from gene arrangement data (Bader et al., 2008; Bernt et al., 2007b; Bourque and Pevzner, 2002; Moret et al., 2002a); or the computation of the similarities of gene arrangements (Bergeron et al., 2008a; Heber et al., 2009). 1 1 Introduction The central theme of this work is to provide efficient algorithms for modified versions of fundamental genome rearrangement problems using more plausible rearrangement models. Two types of modified rearrangement models are explored. The first type is to restrict the set of allowed rearrangements as follows. It can be observed that certain groups of genes are preserved during evolution. This may be caused by functional constraints which prevented the destruction (Lathe et al., 2000; Sémon and Duret, 2006; Xie et al., 2003), certain properties of the rearrangements which shaped the gene orders (Eisen et al., 2000; Sankoff, 2002; Tillier and Collins, 2000), or just because no destructive rearrangement happened since the speciation of the gene orders. It can be assumed that gene groups, found in all studied gene orders, are not acquired independently. Accordingly, these gene groups should be preserved in plausible reconstructions of the course of evolution, in particular the gene groups should be present in the reconstructed putative ancestral gene orders. This can be achieved by restricting the set of rearrangements, which are allowed for the reconstruction, to those which preserve the gene groups of the given gene orders. Since it is difficult to determine functionally what a gene group is, it has been proposed to consider common combinatorial structures of the gene orders as gene groups (Marcotte et al., 1999; Overbeek et al., 1999). The second considered modification of the rearrangement model is extending the set of allowed rearrangement types. Different types of rearrangement operations have shuffled the gene orders during evolution. It should be attempted to use the same set of rearrangement operations for the reconstruction otherwise distorted or even wrong phylogenetic conclusions may be obtained in the worst case. Both possibilities have been considered for certain rearrangement problems before. Restricted sets of allowed rearrangements have been used successfully for the computation of parsimonious rearrangement scenarios consisting of inversions only where the gene groups are identified as common intervals (Bérard et al., 2007; Figeac and Varré, 2004). Extending the set of allowed rearrangement operations is a delicate task. On the one hand it is unknown which rearrangements have to be regarded because this is part of the phylogeny to be discovered. On the other hand, efficient exact rearrangement methods including several operations are still rare, in particular when transpositions should be included. For example, the problem to compute shortest rearrangement scenarios including transpositions is still of unknown computational complexity. Currently, only efficient approximation algorithms are known (e.g. Bader and Ohlebusch, 2007; Elias and Hartman, 2006). Two problems have been studied with respect to one or even both of these possibilities in the scope of this work. The first one is the inversion median problem. Given the gene orders of some taxa, this problem asks for potential ancestral gene orders such that the corresponding inversion scenario is parsimonious, i.e. has a minimum length. Solving this problem is an essential component 2 of algorithms for computing phylogenetic trees from gene arrangements (Bourque and Pevzner, 2002; Moret et al., 2002a, 2001). The unconstrained inversion median problem is NP-hard (Caprara, 2003). In Chapter 3 the inversion median problem is studied under the additional constraint to preserve gene groups of the input gene orders. Common intervals, i.e. sets of genes that appear consecutively in the gene orders, are used for modelling gene groups. The problem of finding such ancestral gene orders is called the preserving inversion median problem. Already the problem of finding a shortest inversion scenario for two gene orders is NP-hard (Figeac and Varré, 2004). Mitochondrial gene orders are a rich source for phylogenetic investigations because they are known for more than 1 000 species. Four rearrangement operations are reported at least in the literature to be relevant for the study of mitochondrial gene order evolution (Boore, 1999): That is inversions, transpositions, inverse transpositions, and tandem duplication random loss (TDRL). Efficient methods for a plausible reconstruction of genome rearrangements for mitochondrial gene orders using all four operations are presented in Chapter 4. An important rearrangement operation, in particular for the study of mitochondrial gene orders, is the tandem duplication random loss operation (e.g. Boore, 2000; Mauro et al., 2006). This rearrangement duplicates a part of a gene order followed by the random loss of one of the redundant copies of each gene. The gene order is rearranged depending on which copy is lost. This rearrangement should be regarded for reconstructing phylogeny from gene order data. But the properties of this rearrangement operation have rarely been studied (Bouvel and Rossin, 2009; Chaudhuri et al., 2006). The combinatorial properties of the TDRL operation are studied in Chapter 5. The enumeration and counting of sorting TDRLs, that is TDRL operations reducing the distance, is studied in particular. Closed formulas for computing the number of sorting TDRLs and methods for the enumeration are presented. Furthermore, TDRLs are one of the operations considered in Chapter 4. An interesting property of this rearrangement, distinguishing it from other rearrangements, is its asymmetry. That is the effects of a single TDRL can (in the most cases) not be reversed with a single TDRL. The use of this property for phylogeny reconstruction is studied in Section 4.3. This thesis is structured as follows. The existing approaches obeying similar types of modified rearrangement models as well as important concepts and computational methods to related problems are reviewed in Chapter 2. The combinatorial structures of gene orders that have been proposed for identifying gene groups, in particular common intervals, as well as the computational approaches for their computation are reviewed in Section 2.2. Approaches for computing parsimonious pairwise rearrangement scenarios are outlined in Section 2.3. Methods for the computation genome rearrangement scenarios obeying biologically motivated constraints, as introduced above, are detailed in Section 2.4. The approaches for the inversion median problem are covered in Section 2.5. Methods for the reconstruction of phylogenetic trees from gene arrangement data are briefly outlined in Section 2.6.3 1 Introduction Chapter 3 introduces the new algorithms CIP, ECIP, and TCIP for solving the preserving inversion median problem. The efficiency of the algorithm is empirically studied for simulated as well as mitochondrial data. The description of algorithms CIP and ECIP is based on Bernt et al. (2006b). TCIP has been described in Bernt et al. (2007a, 2008b). But the theoretical foundation of TCIP is extended significantly within this work in order to allow for more than three input permutations. Gene order rearrangement methods that have been developed for the reconstruction of the phylogeny of mitochondrial gene orders are presented in the fourth chapter. The presented algorithm CREx computes rearrangement scenarios for pairs of gene orders. CREx regards the four types of rearrangement operations which are important for mitochondrial gene orders. Based on CREx the algorithm TreeREx for assigning rearrangement events to a given tree is developed. The quality of the CREx reconstructions is analysed in a large empirical study for simulated gene orders. The results of TreeREx are analysed for several mitochondrial data sets. Algorithms CREx and TreeREx have been published in Bernt et al. (2008a, 2007c). The analysis of the mitochondrial gene orders of Echinodermata was included in Perseke et al. (2008). Additionally, a new and simple method is presented to explore the potential of the CREx method. The new method is applied to the complete mitochondrial data set. The problem of enumerating and counting sorting TDRLs is studied in Chapter 5. The theoretical results are covered to a large extent by Bernt et al. (2009b). The missing combinatorial explanation for some of the presented formulas is given here for the first time. Therefor, a new method for the enumeration and counting of sorting TDRLs has been developed (Bernt et al., 2009a)

    The ABCs of MGR with DCJ

    Get PDF
    We study the small phylogeny problem in the space of multichromosomal genomes under the double cut and join metric. This is similar to the existing MGR (multiple genome rearrangements) approach but it allows, in addition to inversion and reciprocal translocation, operations of transposition and block interchange. Empirically, with chloroplast and mammalian data sets, it finds solutions as good as or better than MGR when the latter operations are prohibited. Permitting these operations allows quantitatively better solutions where part of the reconstructed ancestral genomes may be included in circular chromosomes. We discuss the biological likelihood of transpositions and block interchanges in the mammalian data

    Heuristics for the inversion median problem

    Get PDF
    Background: The study of genome rearrangements has become a mainstay of phylogenetics and comparative genomics. Fundamental in such a study is the median problem: given three gene arrangements, find a fourth that minimizes the sum of the evolutionary distances between itself and the given three. Many exact algorithms and heuristics have been developped for the inversion median problem, of which the best known is MGR. Results: We present a unifying framework for median heuristics, which enables us to clarify existing strategies and to place them in a partial ordering. Analysis of this framework leads to a new insight: the best strategies continue to refer to the input data rather than just to updated estimates. Using this insight, we develop a new heuristic for inversion medians that uses input data to the end of its computation and leverages our previous work with DCJ medians. Finally, we present the results of extensive experimentation showing that our new heuristic outperforms all others in accuracy and, especially, in running time: the heuristic typically returns solutions within 1 % of optimal and runs in seconds to minutes even on genomes with 25’000 genes—in contrast, MGR can take days on instances of 200 genes and cannot be used beyond 1’000 genes. Conclusions: Finding good rearrangement medians, in particular inversion medians, had long been regarded as the computational bottleneck in whole-genome studies. Our new heuristic for inversion medians, ASM, which dominates all others in our framework, puts that issue to rest by providing near-optimal solutions within seconds to minutes on even the largest genomes

    Applications of heuristic search on phylogeny reconstruction problems

    Get PDF
    Phylogenies or evolutionary trees for a given family of species show the evolutionary relationships between these species. The leaves denote the given species, the internal nodes denote their common ancestors and the edges denote the genetic relationships. Species can be identified by their whole genomes and the evolutionary relations between species can be measured by the number of rearrangement events (i.e. mutations) that transform one genome into another. One approach to infer phylogeny from genomic data is by solving median genome problems for three genomes, or the genome rearrangement problem for pairs of genomes, while trying to minimize the total evolutionary distance among the given species. In this thesis, we have developed and implemented two search based algorithms for phylogeny reconstruction problem based on solving median genome problems for circular genomes of the same length without gene duplication. In order to show applicability and effectiveness of our algorithms, we have tested them with randomly generated instances and two real data sets: mitochondrial genomes of Metazoa and chloroplast genomes of Campanulaceae

    Symmetry classes of alternating sign matrices in the nineteen-vertex model

    Full text link
    The nineteen-vertex model on a periodic lattice with an anti-diagonal twist is investigated. Its inhomogeneous transfer matrix is shown to have a simple eigenvalue, with the corresponding eigenstate displaying intriguing combinatorial features. Similar results were previously found for the same model with a diagonal twist. The eigenstate for the anti-diagonal twist is explicitly constructed using the quantum separation of variables technique. A number of sum rules and special components are computed and expressed in terms of Kuperberg's determinants for partition functions of the inhomogeneous six-vertex model. The computations of some components of the special eigenstate for the diagonal twist are also presented. In the homogeneous limit, the special eigenstates become eigenvectors of the Hamiltonians of the integrable spin-one XXZ chain with twisted boundary conditions. Their sum rules and special components for both twists are expressed in terms of generating functions arising in the weighted enumeration of various symmetry classes of alternating sign matrices (ASMs). These include half-turn symmetric ASMs, quarter-turn symmetric ASMs, vertically symmetric ASMs, vertically and horizontally perverse ASMs and double U-turn ASMs. As side results, new determinant and pfaffian formulas for the weighted enumeration of various symmetry classes of alternating sign matrices are obtained.Comment: 61 pages, 13 figure
    • …
    corecore