Search CORE

14 research outputs found

Genomes containing Duplicates are Hard to compare

Author: Chauve Cedric
Fertin Guillaume
Rizzi Romeo
Vialette Stéphane
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

International audienceIn this paper, we are interested in the algorithmic complexity of computing (dis)similarity measures between two genomes when they contain duplicated genes. In that case, there are usually two main ways to compute a given (dis)similarity measure M between two genomes G1 and G2: the rst model, that we will call the matching model, consists in making a one-to-one correspondence between genes of G1 and genes of G2, in such a way that M is optimized. The second model, called the exemplar model, consists in keeping in G1 (resp. G2) exactly one copy of each gene, thus deleting all the other copies, in such a way that M is optimized. We present here dierent results concerning the algorithmic complexity of computing three dierent similarity measures (number of common intervals, MAD number and SAD number) in those two models, basically showing that the problem becomes NP-complete for each of them as soon as genomes contain duplicates. We show indeed that for common intervals, MAD and SAD, the problem is NP-complete when genes are duplicated in genomes, in both the exemplar and matching models. In the case of MAD and SAD, we actually prove that, under both models, both MAD and SAD problems are APX-har

Genomes containing Duplicates are Hard to compare

Author: Chauve Cedric
Fertin Guillaume
Rizzi Romeo
Vialette Stéphane
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

INRIA a CCSD electronic archive server

Genomes containing Duplicates are Hard to compare (Extended Abstract) ⋆

Author: Cedric Chauve
Guillaume Fertin
Romeo Rizzi
Stéphane Vialette
Publication venue
Publication date
Field of study

Abstract. In this paper, we are interested in the algorithmic complexity of computing (dis)similarity measures between two genomes when they contain duplicated genes. In that case, there are usually two main ways to compute a given (dis)similarity measure M between two genomes G1 and G2: the first model, that we will call the matching model, consists in making a one-to-one correspondence between genes of G1 and genes of G2, in such a way that M is optimized. The second model, called the exemplar model, consists in keeping in G1 (resp. G2) exactly one copy of each gene, thus deleting all the other copies, in such a way that M is optimized. We present here different results concerning the algorithmic complexity of computing three different similarity measures (number of common intervals, MAD number and SAD number) in those two models, basically showing that the problem becomes NP-complete for each of them as soon as genomes contain duplicates. We show indeed that for common intervals, MAD and SAD, the problem is NP-complete when genes are duplicated in genomes, in both the exemplar and matching models. In the case of MAD and SAD, we actually prove that, under both models, both MAD and SAD problems are APX-hard.

CiteSeerX

Genomes containing duplicates are hard to compare

Author: Chauve Cedric
Fertin Guillaume
Rizzi Romeo
Vialette Stéphane
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2006
Field of study

International audienc

Hal-Diderot

Genomes containing duplicates are hard to compare

Author: Chauve Cedric
Fertin Guillaume
Rizzi Romeo
Vialette Stéphane
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2006
Field of study

International audienc

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Catalogo dei prodotti della ricerca

Hal-Diderot

HAL - UPEC / UPEM

HAL-Rennes 1

Comparing genomes with duplications: a computational complexity point of view

Author: Cedric Chauve
Guillaume Blin
Guillaume Fertin
Romeo Rizzi
Stéphane Vialette
Publication venue
Publication date: 01/01/2007
Field of study

Abstract—In this paper, we are interested in the computational complexity of computing (dis)similarity measures between two genomes when they contain duplicated genes or genomic markers, a problem that happens frequently when comparing whole nuclear genomes. Recently, several methods [1], [2] have been proposed that are based on two steps to compute a given (dis)similarity measure M between two genomes G1 and G2: First, one establishes a one-to-one correspondence between the genes of G1 and the genes of G2; second, once this correspondence is established, it explicitly defines a permutation and it is then possible to quantify their similarity using classical measures defined for permutations like the number of breakpoints. Hence, these methods rely on two elements: a way to establish a one-to-one correspondence between genes of a pair of genomes and a (dis)similarity measure for permutations. The problem is then, given a (dis)similarity measure for permutations, compute a correspondence that defines an optimal permutation for this measure. We are interested here in two models to compute a one-to-one correspondence: the exemplar model, where all but one copy is deleted in both genomes for each gene family, and the matching model, which computes a maximal correspondence for each gene family. We show that, for these two models and for three (dis)similarity measures on permutations, namely, the number of common intervals, the maximum adjacency disruption (MAD) number, and the summed adjacency disruption (SAD) number, the problem of computing an optimal correspondence is NP-complete and even APX-hard for the MAD number and the SAD number. Index Terms—Comparative genomics, computational complexity, common intervals, maximum adjacency disruption number, summed adjacency disruption number. Ç

CiteSeerX

Catalogo dei prodotti della ricerca

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2

Author: Cedric Chauve
Guillaume Blin
Guillaume Fertin
Romeo Rizzi
Stéphane Vialette
Publication venue
Publication date
Field of study

In this paper, we are interested in the computational complexity of computing (dis)simila-rity measures between two genomes when they contain duplicated genes or genomic markers, a problem that happens frequently when comparing whole nuclear genomes. Recently, several methods ( [1], [2]) have been proposed that are based on two steps to compute a given (dis)similarity measure M between two genomes G1 and G2: first, one establishes a one-to-one correspondence between genes of G1 and genes of G2; second, once this correspondence is established, it defines explicitly a permutation and it is then possible to quantify their similarity using classical measures defined for permutations, like the number of breakpoints. Hence these methods rely on two elements: a way to establish a one-to-one correspondence between genes of a pair of genomes, and a (dis)similarity measure for permutations. The problem is then, given a (dis)similarity measure for permutations, to compute a correspondence that defines an optimal permutation for this measure. We are interested here in two models to compute a one-to-one correspondence: the exemplar model, where all but one copy ar

CiteSeerX