Search CORE

eScholarship - University of California

Efficient algorithms for analyzing segmental duplications with deletions and inversions in genomes

Author: Benjamin J Raphael
CL Kahn
CL Kahn
Crystal L Kahn
D Bertrand
D Sankoff
J Bailey
J Ma
K Chaudhuri
M Johnson
M Lajoie
M Marron
MA Alekseyev
N El-Mabrouk
N El-Mabrouk
O Elemento
P Pevzner
Shay Mozes
X Chen
Y Zhang
Z Jiang
Publication venue: BioMed Central
Publication date: 22/12/2009
Field of study

Background: Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences. Results: We describe a polynomial-time exact algorithm to compute duplication distance, a genomic distance defined as the most parsimonious way to build a target string by repeatedly copying substrings of a fixed source string. This distance models the process of repeated aggregation and duplication. We also describe extensions of this distance to include certain types of substring deletions and inversions. Finally, we provide an description of a sequence of duplication events as a context-free grammar (CFG). Conclusion: These new genomic distances will permit more biologically realistic analyses of segmental duplications in genomes.

CiteSeerX

Directory of Open Access Journals

Genome rearrangements with duplications

Author: B Hiller
C Zheng
D Bertrand
D Bryant
D Sankoff
D Sankoff
Drosophila 12 Genomes Consortium
F Cabanillas
F Mitelman
F Mitelman
G Blanc
G Blin
J Salse
K Swenson
M Bader
M Marron
M Ozery-Flato
M Ozery-Flato
M Ozery-Flato
Martin Bader
N El-Mabrouk
N El-Mabrouk
N El-Mabrouk
S Gog
S Hannenhalli
S Hannenhalli
S Yancopoulos
S Yancopoulos
T Hartman
T Hartman
V Bafna
X Chen
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

A framework for orthology assignment from gene rearrangement data

Author: A. Caprara
B. Larget
B.M.E. Moret
C. Thach Nguyen
D. Bryant
D. Sankoff
D. Sankoff
D. Sankoff
D. Sankoff
D.A. Bader
G. Tesler
J. Earnest-DeYoung
J. Tang
J.L. Boore
J.L. Boore
K.M. Swenson
M. Blanchette
M. Marron
M.E. Cosner
N. El-Mabrouk
N. El-Mabrouk
S. Hannenhalli
S.R. Downie
X. Chen
Publication venue: Springer
Publication date: 01/01/2005
Field of study

Abstract. Gene rearrangements have successfully been used in phylogenetic reconstruction and comparative genomics, but usually under the assumption that all genomes have the same gene content and that no gene is duplicated. While these assumptions allow one to work with organellar genomes, they are too restrictive when comparing nuclear genomes. The main challenge is how to deal with gene families, specifically, how to identify orthologs. While searching for orthologies is a common task in computational biology, it is usually done using sequence data. We approach that problem using gene rearrangement data, provide an optimization framework in which to phrase the problem, and present some preliminary theoretical results.

CiteSeerX

Genome aliquoting with double cut and join

Author: A Bergeron
A Caprara
D Sankoff
D Ware
David Sankoff
J Edmonds
J Edmonds
J Mixtacki
MA Alekseyev
N El-Mabrouk
R Warren
Robert Warren
S Yancopoulos
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The <it>genome aliquoting probem </it>is, given an observed genome <it>A </it>with <it>n </it>copies of each gene, presumed to descend from an <it>n</it>-way polyploidization event from an ordinary diploid genome <it>B</it>, followed by a history of chromosomal rearrangements, to reconstruct the identity of the original genome <it>B'</it>. The idea is to construct <it>B'</it>, containing exactly one copy of each gene, so as to minimize the number of rearrangements <it>d</it>(<it>A, B' </it>⊕ <it>B' </it>⊕ ... ⊕ <it>B'</it>) necessary to convert the observed genome <it>B' </it>⊕ <it>B' </it>⊕ ... ⊕ <it>B' </it>into <it>A</it>. Results In this paper we make the first attempt to define and solve the genome aliquoting problem. We present a heuristic algorithm for the problem as well the data from our experiments demonstrating its validity. Conclusion The heuristic performs well, consistently giving a non-trivial result. The question as to the existence or non-existence of an exact solution to this problem remains open.</p

Directory of Open Access Journals

On the PATHGROUPS approach to rapid small phylogeny

Author: A Caprara
AC Siepel
AW Xu
C Zheng
Chunfang Zheng
D Sankoff
D Sankoff
David Sankoff
E Tannier
G Fertin
KP Byrne
N El-Mabrouk
R Warren
S Yancopoulos
SM Hedtke
Z Adam
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

We present a data structure enabling rapid heuristic solution to the ancestral genome reconstruction problem for given phylogenies under genomic rearrangement metrics. The efficiency of the greedy algorithm is due to fast updating of the structure during run time and a simple priority scheme for choosing the next step. Since accuracy deteriorates for sets of highly divergent genomes, we investigate strategies for improving accuracy and expanding the range of data sets where accurate reconstructions can be expected. This includes a more refined priority system, and a two-step look-ahead, as well as iterative local improvements based on a the median version of the problem, incorporating simulated annealing. We apply this to a set of yeast genomes to corroborate a recent gene sequence-based phylogeny

Directory of Open Access Journals