328 research outputs found

    On the Approximability of Comparing Genomes with Duplicates

    Get PDF
    International audienceA central problem in comparative genomics consists in computing a (dis-)simi- larity measure between two genomes, e.g. in order to construct a phylogenetic tree. A large number of such measures has been proposed in the recent past: number of reversals, number of breakpoints, number of common or conserved intervals, SAD etc. In their initial definitions, all these measures suppose that genomes contain no duplicates. However, we now know that genes can be duplicated within the same genome. One possible approach to overcome this difficulty is to establish a one-to-one correspondence (i.e. a matching) between genes of both genomes, where the correspondence is chosen in order to optimize the studied measure. Then, after a gene relabeling according to this matching and a deletion of the unmatched signed genes, two genomes without duplicates are obtained and the measure can be computed. In this paper, we are interested in three measures (number of breakpoints, number of common intervals and number of conserved intervals) and three models of matching (exemplar model, maximum matching model and non maximum matching model). We prove that, for each model and each measure, computing a matching between two genomes that optimizes the measure is APX-Hard. We show that this result remains true even for two genomes G1 and G2 such that G1 contains no duplicates and no gene of G2 appears more than twice. Therefore, our results extend those of [5–7]. Finally, we propose a 4-approximation algorithm for a measure closely related to the number of breakpoints, the number of adjacencies, under the maximum matching model, in the case where genomes contain the same number of duplications of each gene

    Comparing Bacterial Genomes by Searching their Common Intervals

    Get PDF
    International audienceComparing bacterial genomes implies the use of a dedicated measure. It relies on comparing circular genomes based on a set of conserved genes. Following this assumption, the common interval appears to be a good candidate. For evidences, we propose herein an approach to compute the common intervals between two circular genomes that takes into account duplications. Its application on a concrete case, comparing E. coli and V. cholerae, is accurate. It indeed emphasizes sets of conserved genes that present high impacts on bacterial functions

    An ant colony optimization inspired algorithm for the set packing problem with application to railway infrastructure

    Get PDF
    http://www.emse.fr/~delorme/Papiers/MIC05/MIC05_resume.pdfInternational audienceThe paper concerns an Ant Colony Optimisation (ACO) procedure as approximation method for the railway infrastructure capacity (RIC) problem. Railway infrastructure managers now have to deal with operators' requests for increased capacity. Planning the construction or reconstruction of infrastructures must be done very carefully due to the huge required invest- ments and the long term implications. Usually, assessing the capacity of one component of a rail system is done by measuring the maximum number of trains that can be operated on this component within a certain time period. In our work, we deal with two real situations. The first is Pierrefitte-Gonnesse crossing point located at the north of Paris. The second is the Lille-Flandres station which is the largest station in North of France. Measuring the capacity of junctions is a matter of solving an optimisation problem called the saturation problem [1], and which can be formulated as a Set Packing Problem (SPP). Given a finite set I = {1, . . . , n} of items and {Tj}, j 2 J = {1, . . . ,m}, a collection of m subsets of I, a packing is a subset P I such that |Tj \ P| 1, 8j 2 J. The set J can be also seen as a set of exclusive con- straints between some items of I. Each item i 2 I has a positive weight denoted by ci and the aim of the SPP is to calculate the packing which maximises the total weight. This proble

    On the Approximability of Comparing Genomes with Duplicates

    Get PDF
    International audienceA central problem in comparative genomics consists in computing a (dis-)similarity measure between two genomes, e.g. in order to construct a phylogenetic tree. A large number of such measures has been proposed in the recent past: number of reversals, number of breakpoints, number of common or conserved intervals etc. In their initial definitions, all these measures suppose that genomes contain no duplicates. However, we now know that genes can be duplicated within the same genome. One possible approach to overcome this difficulty is to establish a one-to-one correspondence (i.e. a matching) between genes of both genomes, where the correspondence is chosen in order to optimize the studied measure. Then, after a gene relabeling according to this matching and a deletion of the unmatched signed genes, two genomes without duplicates are obtained and the measure can be computed. In this paper, we are interested in three measures (number of breakpoints, number of common intervals and number of conserved intervals) and three models of matching (exemplar, intermediate and maximum matching models). We prove that, for each model and each measureM, computing a matching between two genomes that optimizes M is APX–hard. We show that this result remains true even for two genomes G1 and G2 such that G1 contains no duplicates and no gene of G2 appears more than twice. Therefore, our results extend those of [7, 10, 13]. Besides, in order to evaluate the possible existence of approximation algorithms concerning the number of breakpoints, we also study the complexity of the following decision problem: is there an exemplarization (resp. an intermediate matching, a maximum matching) that induces no breakpoint ? In particular, we extend a result of [13] by proving the problem to be NP–complete in the exemplar model for a new class of instances, we note that the problems are equivalent in the intermediate and the exemplar models and we show that the problem is in P in the maximum matching model. Finally, we focus on a fourth measure, closely related to the number of breakpoints: the number of adjacencies, for which we give several constant ratio approximation algorithms in the maximum matching model, in the case where genomes contain the same number of duplications of each gene

    Efficient Tools for Computing the Number of Breakpoints and the Number of Adjacencies between two Genomes with Duplicate Genes

    Get PDF
    International audienceComparing genomes of different species is a fundamental problem in comparative genomics. Recent research has resulted in the introduction of different measures between pairs of genomes: reversal distance, number of breakpoints, number of common or conserved intervals, etc. However, classical methods used for computing such measures are seriously compromised when genomes have several copies of the same gene scattered across them. Most approaches to overcome this difficulty are based either on the exemplar model, which keeps exactly one copy in each genome of each duplicated gene, or on the maximum matching model, which keeps as many copies as possible of each duplicated gene. The goal is to find an exemplar matching, respectively a maximum matching, that optimizes the studied measure. Unfortunately, it turns out that, in presence of duplications, this problem for each above-mentioned measure is NP-hard. In this paper, we propose to compute the minimum number of breakpoints and the maximum number of adjacencies between two genomes in presence of duplications using two different approaches. The first one is a (exact) generic 0–1 linear programming approach, while the second is a collection of three heuristics. Each of these approaches is applied on each problem and for each of the following models: exemplar, maximum matching and intermediate model, that we introduce here. All these programs are run on a well-known public benchmark dataset of -Proteobacteria, and their performances are discussed

    The zero exemplar distance problem

    Full text link
    Given two genomes with duplicate genes, \textsc{Zero Exemplar Distance} is the problem of deciding whether the two genomes can be reduced to the same genome without duplicate genes by deleting all but one copy of each gene in each genome. Blin, Fertin, Sikora, and Vialette recently proved that \textsc{Zero Exemplar Distance} for monochromosomal genomes is NP-hard even if each gene appears at most two times in each genome, thereby settling an important open question on genome rearrangement in the exemplar model. In this paper, we give a very simple alternative proof of this result. We also study the problem \textsc{Zero Exemplar Distance} for multichromosomal genomes without gene order, and prove the analogous result that it is also NP-hard even if each gene appears at most two times in each genome. For the positive direction, we show that both variants of \textsc{Zero Exemplar Distance} admit polynomial-time algorithms if each gene appears exactly once in one genome and at least once in the other genome. In addition, we present a polynomial-time algorithm for the related problem \textsc{Exemplar Longest Common Subsequence} in the special case that each mandatory symbol appears exactly once in one input sequence and at least once in the other input sequence. This answers an open question of Bonizzoni et al. We also show that \textsc{Zero Exemplar Distance} for multichromosomal genomes without gene order is fixed-parameter tractable if the parameter is the maximum number of chromosomes in each genome.Comment: Strengthened and reorganize

    Pseudo-Boolean Programming for Partially Ordered Genomes

    Get PDF
    International audienceComparing genomes of different species is a crucial problem in comparative genomics. Different measures have been proposed to compare two genomes: number of common intervals, number of adjacencies, number of reversals, etc. These measures are classically used between two totally ordered genomes. However, genetic mapping techniques often give rise to different maps with some unordered genes. Starting from a partial order between genes of a genome, one method to find a total order consists in optimizing a given measure between a linear extension of this partial order and a given total order of a close and well-known genome. However, for most common measures, the problem turns out to be NP-hard. In this paper, we propose a (0, 1)-linear programming approach to compute a linear extension of one genome that maximizes the number of common intervals (resp. the number of adjacencies) between this linear extension and a given total order. Next, we propose an algorithm to find linear extensions of two partial orders that maximize the number of adjacencies

    Integration of omics data to investigate common intervals

    Get PDF
    International audienceDuring the last decade, we witnessed the huge impact of the comparative genomics for understanding genomes (from the genome organization to their annotation). However, those genomic approaches quickly reach their limits when one looks at investigating the functional properties of genes in the wide genome context. Such limitation may be overcome thanks to recent high-throughput experimental progresses like those obtained via metabolic and co-expression studies, that produce so-called omics data. Therefore, integrating those data and state-of-the-art computational genomic comparison is a natural evolution. This paper achieves such an integration and proposes a heuristic algorithm IISCS that incorporates omics knowledge into the IILCS heuristic, already known as accurate to compare genomes. When applied on bacteria, one emphasize large functional units composed of several operons

    A Pseudo-Boolean programming approach for computing the breakpoint distance between two genomes with duplicate genes

    Get PDF
    International audienceComparing genomes of different species has become a crucial problem in comparative genomics. Recent research have resulted in different genomic distance definitions: number of breakpoints, number of common intervals, number of conserved intervals, Maximum Adjacency Disruption number (MAD), etc. Classical methods (usually based on permutations of gene order) for computing genomic distances between whole genomes are however seriously compromised for genomes where several copies of the same gene may be scattered across the genome. Most approaches to overcoming this difficulty are based on the exemplar method (keep exactly one copy in each genome of each duplicated gene) and the maximum matching method (keep as many copies as possible in each genome of each duplicated gene). Unfortunately, it turns out that, in presence of duplications, most problems are NP–hard, and hence several heuristics have been recently proposed. Extending research initiated in [2], we propose in this paper a novel generic pseudo-boolean approach for computing the exact breakpoint distance between two genomes in presence of duplications for both the exemplar and maximum matching methods. We illustrate the application of this methodology on a well-known public benchmark dataset of gamma -Proteobacteria
    corecore