Search CORE

INRIA a CCSD electronic archive server

Gene Maps Linearization Using Genomic Rearrangement Distances

Author: Bowers J.
Danny Hermelin
El-Mabrouk N.
Eric Blais
Guillaume Blin
Jackson B.
Mathieu Blanchette
Nadia El-Mabrouk
Pierre Guillon
Sankoff D.
Yap I.
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

A fast algorithm for the multiple genome rearrangement problem with weighted reversals and transpositions

Author: A Bergeron
A Caprara
A Caprara
B Bourque
B Moret
B Moret
D Bader
D Sankoff
D Sankoff
E Tannier
Enno Ohlebusch
G Fritzsch
J Tang
M Bader
M Bader
M Bernt
M Blanchette
M Blanchette
M Cosner
Martin Bader
Mohamed I Abouelhoda
N Eriksen
P Pevzner
S Hannenhalli
S Wu
S Wu
T Hartman
T Liu
V Bafna
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Due to recent progress in genome sequencing, more and more data for phylogenetic reconstruction based on rearrangement distances between genomes become available. However, this phylogenetic reconstruction is a very challenging task. For the most simple distance measures (the breakpoint distance and the reversal distance), the problem is NP-hard even if one considers only three genomes. Results In this paper, we present a new heuristic algorithm that directly constructs a phylogenetic tree w.r.t. the weighted reversal and transposition distance. Experimental results on previously published datasets show that constructing phylogenetic trees in this way results in better trees than constructing the trees w.r.t. the reversal distance, and recalculating the weight of the trees with the weighted reversal and transposition distance. An implementation of the algorithm can be obtained from the authors. Conclusion The possibility of creating phylogenetic trees directly w.r.t. the weighted reversal and transposition distance results in biologically more realistic scenarios. Our algorithm can solve today's most challenging biological datasets in a reasonable amount of time.</p

Sorting by reversals, block interchanges, tandem duplications, and deletions

Author: D Bader
D Bertrand
D Christie
D Sankoff
D Sankoff
E Tannier
G Blanc
H Nagamochi
I Elias
J Mixtacki
K Swenson
M Marron
M Ozery-Flato
Martin Bader
N El-Mabrouk
N El-Mabrouk
N El-Mabrouk
N El-Mabrouk
R Warren
S Hannenhalli
S Yancopoulos
S Yancopoulos
T Hartman
T Hartman
V Bafna
X Chen
Y Han
Z Fu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Finding sequences of evolutionary operations that transform one genome into another is a classic problem in comparative genomics. While most of the genome rearrangement algorithms assume that there is exactly one copy of each gene in both genomes, this does not reflect the biological reality very well – most of the studied genomes contain duplicated gene content, which has to be removed before applying those algorithms. However, dealing with unequal gene content is a very challenging task, and only few algorithms allow operations like duplications and deletions. Almost all of these algorithms restrict these operations to have a fixed size. Results In this paper, we present a heuristic algorithm to sort an ancestral genome (with unique gene content) into a genome of a descendant (with arbitrary gene content) by reversals, block interchanges, tandem duplications, and deletions, where tandem duplications and deletions are of arbitrary size. Conclusion Experimental results show that our algorithm finds sorting sequences that are close to an optimal sorting sequence when the ancestor and the descendant are closely related. The quality of the results decreases when the genomes get more diverged or the genome size increases. Nevertheless, the calculated distances give a good approximation of the true evolutionary distances.</p

Sampling and counting genome rearrangement scenarios

Author: A Bergeron
A Bergeron
A Caprara
A Darling
A Karzanov
A Ouangraoua
A Rajaraman
AC Siepel
B Larget
C Chauve
C Zheng
D Sankoff
DVM Braga
E Tannier
E Tannier
G Brightwell
Heather Smith
I Miklós
I Miklós
I Miklós
I Miklós
I Miklós
I Miklós
I Miklós
István Miklós
JS Liu
KM Swenson
L Lovász
LG Valiant
MA Alekseyev
MA Alekseyev
MR Jerrum
MR Jerrum
N Metropolis
P Feijão
PL Erdős
R Durrett
R Warren
S Geman
S Hannenhalli
W Hastings
WM Fitch
Y Ajana
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Even for moderate size inputs, there are a tremendous number of optimal rearrangement scenarios, regardless what the model is and which specific question is to be answered. Therefore giving one optimal solution might be misleading and cannot be used for statistical inferring. Statistically well funded methods are necessary to sample uniformly from the solution space and then a small number of samples are sufficient for statistical inferring

SZTAKI Publication Repository

Comparative Analysis of DNA Replication Timing Reveals Conserved Large-Scale Chromosomal Architecture

Author: Amos Tanay
Andreas Polten
B Wen
BE Bernstein
BJ Harvey
CJ Pink
CL Chen
D Sankoff
D Sankoff
D Schubeler
DM MacAlpine
E Lieberman-Aiden
Eitan Yaffe
EJ White
F Chiaromonte
FM Pauler
I Hiratani
I Hiratani
Itamar Simon
J Ma
K Woodfine
K Woodfine
L Guelen
LD Hurst
MA Alekseyev
N Gilbert
NN Batada
R Desprat
RH Waterston
RM Kuhn
RS Hansen
RS Mani
S Farkash-Amar
S Farkash-Amar
S Jaschek
S Schwartz
Shlomit Farkash-Amar
T Karube
T Ryba
TS Mikkelsen
Wendy A. Bickmore
WJ Kent
Y Jeon
Zohar Yakhini
Publication venue: Public Library of Science
Publication date: 01/07/2010
Field of study

Recent evidence suggests that the timing of DNA replication is coordinated across megabase-scale domains in metazoan genomes, yet the importance of this aspect of genome organization is unclear. Here we show that replication timing is remarkably conserved between human and mouse, uncovering large regions that may have been governed by similar replication dynamics since these species have diverged. This conservation is both tissue-specific and independent of the genomic G+C content conservation. Moreover, we show that time of replication is globally conserved despite numerous large-scale genome rearrangements. We systematically identify rearrangement fusion points and demonstrate that replication time can be locally diverged at these loci. Conversely, rearrangements are shown to be correlated with early replication and physical chromosomal proximity. These results suggest that large chromosomal domains of coordinated replication are shuffled by evolution while conserving the large-scale nuclear architecture of the genome

Multichromosomal median and halving problems under different genomic distances

Author: A Bergeron
A Bergeron
A Bergeron
A Caprara
C Zheng
C Zheng
C Zheng
C Zheng
Chunfang Zheng
D Bryant
D Sankoff
David Sankoff
E Ohlebusch
E Tannier
Eric Tannier
G Bourque
G Fertin
G Jean
G Tesler
G Watterson
I Pe'er
J Aury
J Mixtacki
L Lovasz
M Alekseyev
M Bernt
M Ozery-Flato
MR Garey
N El-Mabrouk
P Berman
P Pevzner
R Lenne
R Warren
S Hannenhalli
S Hannenhalli
S Otto
S Yancopoulos
W Xu
X Chen
Y Lin
YC Lin
Z Adam
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Genome median and genome halving are combinatorial optimization problems that aim at reconstructing ancestral genomes as well as the evolutionary events leading from the ancestor to extant species. Exploring complexity issues is a first step towards devising efficient algorithms. The complexity of the median problem for unichromosomal genomes (permutations) has been settled for both the breakpoint distance and the reversal distance. Although the multichromosomal case has often been assumed to be a simple generalization of the unichromosomal case, it is also a relaxation so that complexity in this context does not follow from existing results, and is open for all distances. Results We settle here the complexity of several genome median and halving problems, including a surprising polynomial result for the breakpoint median and guided halving problems in genomes with circular and linear chromosomes, showing that the multichromosomal problem is actually easier than the unichromosomal problem. Still other variants of these problems are NP-complete, including the DCJ double distance problem, previously mentioned as an open question. We list the remaining open problems. Conclusion This theoretical study clears up a wide swathe of the algorithmical study of genome rearrangements with multiple multichromosomal genomes.</p

INRIA a CCSD electronic archive server

Hal-Diderot

Analysis of Gene Order Evolution Beyond Single-Copy Genes

Author: A Bergeron
A Bergeron
A Siepel
A Xu
B Arden
B Ma
B Moret
B Vernot
C Chauve
C Zheng
C Zheng
C Zheng
C Zheng
C. Chauve
CM Zmasek
D Bader
D Bertrand
D Bertrand
D Durand
D Durand
D Fulkerson
D Sankoff
D Sankoff
D Sankoff
D Sankoff
D Sankoff
D Soltis
E Eichler
E Lyons
F Murat
F. Murat
G Blanc
G Blin
G Bourque
G Fertin
G Glusman
G Landau
G Shi
G Tesler
G Watterson
H Gavranovic
H Gavranović
I Wapinski
J Bowers
J Cotton
J Demuth
J Gordon
J Mixtacki
J Nadeau
J Salse
J-P Doyon
K Chen
K O’Brien
K Wolfe
L Zhang
L Zhang
M Alekseyev
M Goodman
M Hahn
M Lajoie
M Lajoie
M Lynch
M Muffato
M Sanderson
M Shannon
N El-Mabrouk
O Elemento
O Eulenstein
O Tremblay-Savard
P Bonizzoni
P Gorecki
P Pevzner
Q Zhu
R Guigó
R Hoberman
R LaRue
R Page
R Page
R Page
R Tatusov
R Warren
S Angibaud
S Hannenhalli
S Pham
S Schwartz
S Yancopoulos
S Yancopoulos
T Blomme
T Uno
T Vinař
V Bafna
V Shoja
W Fitch
W Li
WJ Kent
Z Adam
Z Fu
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs

Author: Altschul
Altschul
Arun S. Konagurthu
Arunachalam
Bandyopadhyay
Bansal
Calabrese
Dehal
Dice
Edgar
Edgar
Flicek
Fukuhara
Geoffrey I. Webb
Gordân
Haas
Hachiya
James C. Whisstock
Jiangning Song
Jun
Khalid Mahmood
Koohy
Koonin
Kriventseva
Kuhn
Kärkkäinen
Li
Mahmood
Needleman
Papadimitriou
Pearson
Pruess
Remm
Sakarya
Sankoff
Santini
Sjolander
Smith
Smith
Sonnhammer
Sorensen
Swidan
Vandepoele
Vinga
Vingron
Widmann
Woolfe
Xu
Yu
Zhi
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

Broadly, computational approaches for ortholog assignment is a three steps process: (i) identify all putative homologs between the genomes, (ii) identify gene anchors and (iii) link anchors to identify best gene matches given their order and context. In this article, we engineer two methods to improve two important aspects of this pipeline [specifically steps (ii) and (iii)]. First, computing sequence similarity data [step (i)] is a computationally intensive task for large sequence sets, creating a bottleneck in the ortholog assignment pipeline. We have designed a fast and highly scalable sort-join method (afree) based on k-mer counts to rapidly compare all pairs of sequences in a large protein sequence set to identify putative homologs. Second, availability of complex genomes containing large gene families with prevalence of complex evolutionary events, such as duplications, has made the task of assigning orthologs and co-orthologs difficult. Here, we have developed an iterative graph matching strategy where at each iteration the best gene assignments are identified resulting in a set of orthologs and co-orthologs. We find that the afree algorithm is faster than existing methods and maintains high accuracy in identifying similar genes. The iterative graph matching strategy also showed high accuracy in identifying complex gene relationships. Standalone afree available from http://vbc.med.monash.edu.au/∼kmahmood/afree. EGM2, complete ortholog assignment pipeline (including afree and the iterative graph matching method) available from http://vbc.med.monash.edu.au/∼kmahmood/EGM2

Monash University Research Portal

University of Melbourne Institutional Repository

Predicting RNA secondary structure by the comparative approach: how to select the homologous sequences

Author: A Lescoute
AM Rosenblad
C Papanicolaou
C Woese
C Zwieb
C Zwieb
D Chiu
D Gautheret
D Mathews
D Matthews
D Sankoff
E Bindewald
F Rousset
F Tahi
F Tahi
Fariza Tahi
I Hofacker
J Brown
K Han
K Horimoto
L Vawter
M Szymanski
M Zuker
N Savill
O Perriquet
P Baldi
P Doty
P Higgs
PP Gardner
R Nussinov
RJ Klein
RR Gutell
S Freier
S Lindgreen
Stéfan Engelen
WC Curtis
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The secondary structure of an RNA must be known before the relationship between its structure and function can be determined. One way to predict the secondary structure of an RNA is to identify covarying residues that maintain the pairings (Watson-Crick, Wobble and non-canonical pairings). This "comparative approach" consists of identifying mutations from homologous sequence alignments. The sequences must covary enough for compensatory mutations to be revealed, but comparison is difficult if they are too different. Thus the choice of homologous sequences is critical. While many possible combinations of homologous sequences may be used for prediction, only a few will give good structure predictions. This can be due to poor quality alignment in stems or to the variability of certain sequences. This problem of sequence selection is currently unsolved. Results This paper describes an algorithm, <it>SSCA</it>, which measures the suitability of sequences for the comparative approach. It is based on evolutionary models with structure constraints, particularly those on sequence variations and stem alignment. We propose three models, based on different constraints on sequence alignments. We show the results of the <it>SSCA </it>algorithm for predicting the secondary structure of several RNAs. <it>SSCA </it>enabled us to choose sets of homologous sequences that gave better predictions than arbitrarily chosen sets of homologous sequences. Conclusion <it>SSCA </it>is an algorithm for selecting combinations of RNA homologous sequences suitable for secondary structure predictions with the comparative approach.</p

HAL Evry