Search CORE

50,865 research outputs found

The multiple gene duplication problem revisited

Author: Blanc
Bowers
Chen
Guig
Guyot
M. S. Bansal
Mirkin
O. Eulenstein
Rensing
Rong
Schlueter
Sterck
Vision
Wang
Wapinski
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2008
Field of study

Motivation: Deciphering the location of gene duplications and multiple gene duplication episodes on the Tree of Life is fundamental to understanding the way gene families and genomes evolve. The multiple gene duplication problem provides a framework for placing gene duplication events onto nodes of a given species tree, and detecting episodes of multiple gene duplication. One version of the multiple gene duplication problem was defined by Guigó et al. in 1996. Several heuristic solutions have since been proposed for this problem, but no exact algorithms were known

CiteSeerX

Crossref

PubMed Central

Inferring Species Trees from Incongruent Multi-Copy Gene Trees Using the Robinson-Foulds Distance

Author: Burleigh J. Gordon
Chaudhary Ruchi
Fernández-Baca David
Publication venue
Publication date: 09/10/2012
Field of study

We present a new method for inferring species trees from multi-copy gene trees. Our method is based on a generalization of the Robinson-Foulds (RF) distance to multi-labeled trees (mul-trees), i.e., gene trees in which multiple leaves can have the same label. Unlike most previous phylogenetic methods using gene trees, this method does not assume that gene tree incongruence is caused by a single, specific biological process, such as gene duplication and loss, deep coalescence, or lateral gene transfer. We prove that it is NP-hard to compute the RF distance between two mul-trees, but it is easy to calculate the generalized RF distance between a mul-tree and a singly-labeled tree. Motivated by this observation, we formulate the RF supertree problem for mul-trees (MulRF), which takes a collection of mul-trees and constructs a species tree that minimizes the total RF distance from the input mul-trees. We present a fast heuristic algorithm for the MulRF supertree problem. Simulation experiments demonstrate that the MulRF method produces more accurate species trees than gene tree parsimony methods when incongruence is caused by gene tree error, duplications and losses, and/or lateral gene transfer. Furthermore, the MulRF heuristic runs quickly on data sets containing hundreds of trees with up to a hundred taxa.Comment: 16 pages, 11 figure

arXiv.org e-Print Archive

Springer - Publisher Connector

Hidden breakpoints in genome alignments

Author: A. Rambaut
A.C.E. Darling
A.E. Darling
A.L. Delcher
C.D. Greenman
D. Medini
E. Tannier
G. Fudenberg
M. Blanchette
M. Nowacki
M.A. Umbarger
S. De
S. Schwartz
S.V. Angiuoli
V. Kolmogorov
Publication venue
Publication date: 01/01/2012
Field of study

During the course of evolution, an organism's genome can undergo changes that affect the large-scale structure of the genome. These changes include gene gain, loss, duplication, chromosome fusion, fission, and rearrangement. When gene gain and loss occurs in addition to other types of rearrangement, breakpoints of rearrangement can exist that are only detectable by comparison of three or more genomes. An arbitrarily large number of these "hidden" breakpoints can exist among genomes that exhibit no rearrangements in pairwise comparisons. We present an extension of the multichromosomal breakpoint median problem to genomes that have undergone gene gain and loss. We then demonstrate that the median distance among three genomes can be used to calculate a lower bound on the number of hidden breakpoints present. We provide an implementation of this calculation including the median distance, along with some practical improvements on the time complexity of the underlying algorithm. We apply our approach to measure the abundance of hidden breakpoints in simulated data sets under a wide range of evolutionary scenarios. We demonstrate that in simulations the hidden breakpoint counts depend strongly on relative rates of inversion and gene gain/loss. Finally we apply current multiple genome aligners to the simulated genomes, and show that all aligners introduce a high degree of error in hidden breakpoint counts, and that this error grows with evolutionary distance in the simulation. Our results suggest that hidden breakpoint error may be pervasive in genome alignments.Comment: 13 pages, 4 figure

arXiv.org e-Print Archive

Crossref

OPUS - University of Technology Sydney

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)