Search CORE

Qucosa - Publikationsserver der Universität Leipzig

Gene order rearrangement methods for the reconstruction of phylogeny

Author: Bernt Matthias
Publication venue
Publication date: 29/01/2010
Field of study

The study of phylogeny, i.e. the evolutionary history of species, is a central problem in biology and a key for understanding characteristics of contemporary species. Many problems in this area can be formulated as combinatorial optimisation problems which makes it particularly interesting for computer scientists. The reconstruction of the phylogeny of species can be based on various kinds of data, e.g. morphological properties or characteristics of the genetic information of the species. Maximum parsimony is a popular and widely used method for phylogenetic reconstruction aiming for an explanation of the observed data requiring the least evolutionary changes. A certain property of the genetic information gained much interest for the reconstruction of phylogeny in recent time: the organisation of the genomes of species, i.e. the arrangement of the genes on the chromosomes. But the idea to reconstruct phylogenetic information from gene arrangements has a long history. In Dobzhansky and Sturtevant (1938) it was already pointed out that “a comparison of the different gene arrangements in the same chromosome may, in certain cases, throw light on the historical relationships of these structures, and consequently on the history of the species as a whole”. This kind of data is promising for the study of deep evolutionary relationships because gene arrangements are believed to evolve slowly (Rokas and Holland, 2000). This seems to be the case especially for mitochondrial genomes which are available for a wide range of species (Boore, 1999). The development of methods for the reconstruction of phylogeny from gene arrangement data has made considerable progress during the last years. Prominent examples are the computation of parsimonious evolutionary scenarios, i.e. a shortest sequence of rearrangements transforming one arrangement of genes into another or the length of such a minimal scenario (Hannenhalli and Pevzner, 1995b; Sankoff, 1992; Watterson et al., 1982); the reconstruction of parsimonious phylogenetic trees from gene arrangement data (Bader et al., 2008; Bernt et al., 2007b; Bourque and Pevzner, 2002; Moret et al., 2002a); or the computation of the similarities of gene arrangements (Bergeron et al., 2008a; Heber et al., 2009). 1 1 Introduction The central theme of this work is to provide efficient algorithms for modified versions of fundamental genome rearrangement problems using more plausible rearrangement models. Two types of modified rearrangement models are explored. The first type is to restrict the set of allowed rearrangements as follows. It can be observed that certain groups of genes are preserved during evolution. This may be caused by functional constraints which prevented the destruction (Lathe et al., 2000; Sémon and Duret, 2006; Xie et al., 2003), certain properties of the rearrangements which shaped the gene orders (Eisen et al., 2000; Sankoff, 2002; Tillier and Collins, 2000), or just because no destructive rearrangement happened since the speciation of the gene orders. It can be assumed that gene groups, found in all studied gene orders, are not acquired independently. Accordingly, these gene groups should be preserved in plausible reconstructions of the course of evolution, in particular the gene groups should be present in the reconstructed putative ancestral gene orders. This can be achieved by restricting the set of rearrangements, which are allowed for the reconstruction, to those which preserve the gene groups of the given gene orders. Since it is difficult to determine functionally what a gene group is, it has been proposed to consider common combinatorial structures of the gene orders as gene groups (Marcotte et al., 1999; Overbeek et al., 1999). The second considered modification of the rearrangement model is extending the set of allowed rearrangement types. Different types of rearrangement operations have shuffled the gene orders during evolution. It should be attempted to use the same set of rearrangement operations for the reconstruction otherwise distorted or even wrong phylogenetic conclusions may be obtained in the worst case. Both possibilities have been considered for certain rearrangement problems before. Restricted sets of allowed rearrangements have been used successfully for the computation of parsimonious rearrangement scenarios consisting of inversions only where the gene groups are identified as common intervals (Bérard et al., 2007; Figeac and Varré, 2004). Extending the set of allowed rearrangement operations is a delicate task. On the one hand it is unknown which rearrangements have to be regarded because this is part of the phylogeny to be discovered. On the other hand, efficient exact rearrangement methods including several operations are still rare, in particular when transpositions should be included. For example, the problem to compute shortest rearrangement scenarios including transpositions is still of unknown computational complexity. Currently, only efficient approximation algorithms are known (e.g. Bader and Ohlebusch, 2007; Elias and Hartman, 2006). Two problems have been studied with respect to one or even both of these possibilities in the scope of this work. The first one is the inversion median problem. Given the gene orders of some taxa, this problem asks for potential ancestral gene orders such that the corresponding inversion scenario is parsimonious, i.e. has a minimum length. Solving this problem is an essential component 2 of algorithms for computing phylogenetic trees from gene arrangements (Bourque and Pevzner, 2002; Moret et al., 2002a, 2001). The unconstrained inversion median problem is NP-hard (Caprara, 2003). In Chapter 3 the inversion median problem is studied under the additional constraint to preserve gene groups of the input gene orders. Common intervals, i.e. sets of genes that appear consecutively in the gene orders, are used for modelling gene groups. The problem of finding such ancestral gene orders is called the preserving inversion median problem. Already the problem of finding a shortest inversion scenario for two gene orders is NP-hard (Figeac and Varré, 2004). Mitochondrial gene orders are a rich source for phylogenetic investigations because they are known for more than 1 000 species. Four rearrangement operations are reported at least in the literature to be relevant for the study of mitochondrial gene order evolution (Boore, 1999): That is inversions, transpositions, inverse transpositions, and tandem duplication random loss (TDRL). Efficient methods for a plausible reconstruction of genome rearrangements for mitochondrial gene orders using all four operations are presented in Chapter 4. An important rearrangement operation, in particular for the study of mitochondrial gene orders, is the tandem duplication random loss operation (e.g. Boore, 2000; Mauro et al., 2006). This rearrangement duplicates a part of a gene order followed by the random loss of one of the redundant copies of each gene. The gene order is rearranged depending on which copy is lost. This rearrangement should be regarded for reconstructing phylogeny from gene order data. But the properties of this rearrangement operation have rarely been studied (Bouvel and Rossin, 2009; Chaudhuri et al., 2006). The combinatorial properties of the TDRL operation are studied in Chapter 5. The enumeration and counting of sorting TDRLs, that is TDRL operations reducing the distance, is studied in particular. Closed formulas for computing the number of sorting TDRLs and methods for the enumeration are presented. Furthermore, TDRLs are one of the operations considered in Chapter 4. An interesting property of this rearrangement, distinguishing it from other rearrangements, is its asymmetry. That is the effects of a single TDRL can (in the most cases) not be reversed with a single TDRL. The use of this property for phylogeny reconstruction is studied in Section 4.3. This thesis is structured as follows. The existing approaches obeying similar types of modified rearrangement models as well as important concepts and computational methods to related problems are reviewed in Chapter 2. The combinatorial structures of gene orders that have been proposed for identifying gene groups, in particular common intervals, as well as the computational approaches for their computation are reviewed in Section 2.2. Approaches for computing parsimonious pairwise rearrangement scenarios are outlined in Section 2.3. Methods for the computation genome rearrangement scenarios obeying biologically motivated constraints, as introduced above, are detailed in Section 2.4. The approaches for the inversion median problem are covered in Section 2.5. Methods for the reconstruction of phylogenetic trees from gene arrangement data are briefly outlined in Section 2.6.3 1 Introduction Chapter 3 introduces the new algorithms CIP, ECIP, and TCIP for solving the preserving inversion median problem. The efficiency of the algorithm is empirically studied for simulated as well as mitochondrial data. The description of algorithms CIP and ECIP is based on Bernt et al. (2006b). TCIP has been described in Bernt et al. (2007a, 2008b). But the theoretical foundation of TCIP is extended significantly within this work in order to allow for more than three input permutations. Gene order rearrangement methods that have been developed for the reconstruction of the phylogeny of mitochondrial gene orders are presented in the fourth chapter. The presented algorithm CREx computes rearrangement scenarios for pairs of gene orders. CREx regards the four types of rearrangement operations which are important for mitochondrial gene orders. Based on CREx the algorithm TreeREx for assigning rearrangement events to a given tree is developed. The quality of the CREx reconstructions is analysed in a large empirical study for simulated gene orders. The results of TreeREx are analysed for several mitochondrial data sets. Algorithms CREx and TreeREx have been published in Bernt et al. (2008a, 2007c). The analysis of the mitochondrial gene orders of Echinodermata was included in Perseke et al. (2008). Additionally, a new and simple method is presented to explore the potential of the CREx method. The new method is applied to the complete mitochondrial data set. The problem of enumerating and counting sorting TDRLs is studied in Chapter 5. The theoretical results are covered to a large extent by Bernt et al. (2009b). The missing combinatorial explanation for some of the presented formulas is given here for the first time. Therefor, a new method for the enumeration and counting of sorting TDRLs has been developed (Bernt et al., 2009a)

Sorting permutations by cut-circularize-linearize-and-paste operations

Author: Chen Kun-Tze
Huang Keng-Hsuan
Lu Chin Lung
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Genome rearrangements are studied on the basis of genome-wide analysis of gene orders and important in the evolution of species. In the last two decades, a variety of rearrangement operations, such as reversals, transpositions, block-interchanges, translocations, fusions and fissions, have been proposed to evaluate the differences between gene orders in two or more genomes. Usually, the computational studies of genome rearrangements are formulated as problems of sorting permutations by rearrangement operations. Result In this article, we study a sorting problem by cut-circularize-linearize-and-paste (CCLP) operations, which aims to find a minimum number of CCLP operations to sort a signed permutation representing a chromosome. The CCLP is a genome rearrangement operation that cuts a segment out of a chromosome, circularizes the segment into a temporary circle, linearizes the temporary circle as a linear segment, and possibly inverts the linearized segment and pastes it into the remaining chromosome. The CCLP operation can model many well-known rearrangements, such as reversals, transpositions and block-interchanges, and others not reported in the biological literature. In addition, it really occurs in the immune response of higher animals. To distinguish those CCLP operations from the reversal, we call them as non-reversal CCLP operations. In this study, we use permutation groups in algebra to design an <it>O</it>(<it>δn</it>) time algorithm for solving the weighted sorting problem by CCLP operations when the weight ratio between reversals and non-reversal CCLP operations is 1:2, where <it>n</it> is the number of genes in the given chromosome and <it>δ</it> is the number of needed CCLP operations. Conclusion The algorithm we propose in this study is very simple so that it can be easily implemented with 1-dimensional arrays and useful in the studies of phylogenetic tree reconstruction and human immune response to tumors.</p

Directory of Open Access Journals

Brunel University Research Archive

Recommended from our members

Schema theory for gene expression programming

Author: Huang Zhengwen
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2014
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.This thesis studied a new variant of Evolutionary Algorithms called Gene Expression Programming. The evolution process of Gene Expression Programming was investigated from the practice to the theory. As a practice level, the original version of Gene Expression Programming was applied to a classification problem and an enhanced version of the algorithm was consequently developed. This allowed the development of a general understanding of each component of the genotype and phenotype separated representation system of the solution employed by the algorithm. Based on such an understanding, a version of the schema theory was developed for Gene Expression Programming. The genetic modifications provided by each genetic operator employed by this algorithm were analysed and a set of theorems predicting the propagation of the schema from one generation to another was developed. Also a set of experiments were performed to test the validity of the developed schema theory obtaining good agreement between the experimental results and the theoretical predictions

Efficient algorithms for analyzing segmental duplications with deletions and inversions in genomes

Author: Benjamin J Raphael
CL Kahn
CL Kahn
Crystal L Kahn
D Bertrand
D Sankoff
J Bailey
J Ma
K Chaudhuri
M Johnson
M Lajoie
M Marron
MA Alekseyev
N El-Mabrouk
N El-Mabrouk
O Elemento
P Pevzner
Shay Mozes
X Chen
Y Zhang
Z Jiang
Publication venue: BioMed Central
Publication date: 22/12/2009
Field of study

Background: Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences. Results: We describe a polynomial-time exact algorithm to compute duplication distance, a genomic distance defined as the most parsimonious way to build a target string by repeatedly copying substrings of a fixed source string. This distance models the process of repeated aggregation and duplication. We also describe extensions of this distance to include certain types of substring deletions and inversions. Finally, we provide an description of a sequence of duplication events as a context-free grammar (CFG). Conclusion: These new genomic distances will permit more biologically realistic analyses of segmental duplications in genomes.

CiteSeerX

Directory of Open Access Journals

The rise and fall of breakpoint reuse depending on genome resolution

Author: Attie Oliver
Darling Aaron E
Yancopoulos Sophia
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background During evolution, large-scale genome rearrangements of chromosomes shuffle the order of homologous genome sequences ("synteny blocks") across species. Some years ago, a controversy erupted in genome rearrangement studies over whether rearrangements recur, causing breakpoints to be reused. Methods We investigate this controversial issue using the synteny block's for human-mouse-rat reported by Bourque <it>et al</it>. and a series of synteny blocks we generated using Mauve at resolutions ranging from coarse to very fine-scale. We conducted analyses to test how resolution affects the traditional measure of the breakpoint reuse rate<it>.</it> Results We found that the inversion-based breakpoint reuse rate is low at fine-scale synteny block resolution and that it rises and eventually falls as synteny block resolution decreases. By analyzing the cycle structure of the breakpoint graph of human-mouse-rat synteny blocks for human-mouse and comparing with theoretically derived distributions for random genome rearrangements, we showed that the implied genome rearrangements at each level of resolution become more “random” as synteny block resolution diminishes. At highest synteny block resolutions the Hannenhalli-Pevzner inversion distance deviates from the Double Cut and Join distance, possibly due to small-scale transpositions or simply due to inclusion of erroneous synteny blocks. At synteny block resolutions as coarse as the Bourque <it>et al</it>. blocks, we show the breakpoint graph cycle structure has already converged to the pattern expected for a random distribution of synteny blocks. Conclusions The inferred breakpoint reuse rate depends on synteny block resolution in human-mouse genome comparisons. At fine-scale resolution, the cycle structure for the transformation appears less random compared to that for coarse resolution. Small synteny blocks may contain critical information for accurate reconstruction of genome rearrangement history and parameters.</p

CiteSeerX

Directory of Open Access Journals

OPUS - University of Technology Sydney

eScholarship - University of California

Analysis of Gene Order Evolution Beyond Single-Copy Genes

Author: A Bergeron
A Bergeron
A Siepel
A Xu
B Arden
B Ma
B Moret
B Vernot
C Chauve
C Zheng
C Zheng
C Zheng
C Zheng
C. Chauve
CM Zmasek
D Bader
D Bertrand
D Bertrand
D Durand
D Durand
D Fulkerson
D Sankoff
D Sankoff
D Sankoff
D Sankoff
D Sankoff
D Soltis
E Eichler
E Lyons
F Murat
F. Murat
G Blanc
G Blin
G Bourque
G Fertin
G Glusman
G Landau
G Shi
G Tesler
G Watterson
H Gavranovic
H Gavranović
I Wapinski
J Bowers
J Cotton
J Demuth
J Gordon
J Mixtacki
J Nadeau
J Salse
J-P Doyon
K Chen
K O’Brien
K Wolfe
L Zhang
L Zhang
M Alekseyev
M Goodman
M Hahn
M Lajoie
M Lajoie
M Lynch
M Muffato
M Sanderson
M Shannon
N El-Mabrouk
O Elemento
O Eulenstein
O Tremblay-Savard
P Bonizzoni
P Gorecki
P Pevzner
Q Zhu
R Guigó
R Hoberman
R LaRue
R Page
R Page
R Page
R Tatusov
R Warren
S Angibaud
S Hannenhalli
S Pham
S Schwartz
S Yancopoulos
S Yancopoulos
T Blomme
T Uno
T Vinař
V Bafna
V Shoja
W Fitch
W Li
WJ Kent
Z Adam
Z Fu
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Sorting by reversals and block-interchanges with various weight assignments

Author: A Bergeron
A Bergeron
C Mira
Chun-Yuan Lin
Chunhung Richard Lin
DA Bader
DA Christie
E Tannier
GH Lin
H Kaplan
I Elias
J Feng
KM Swenson
KM Swenson
M Bader
M Bader
MEMT Walter
N El-Mabrouk
N Eriksen
QP Gu
S Gog
S Hannenhalli
S Yancopoulos
T Hartman
V Bafna
VV Vazirani
Y Han
YC Lin
YC Lin
Ying Chih Lin
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study