Search CORE

320 research outputs found

The Longest Common Exemplar Subsequence Problem

Author: Feng Haodi
Guo Jiong
Jiang Haitao
Liu Xiaowen
Wang Ruizhi
Zhang Shu
Zhu Daming
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2018
Field of study

In this paper, we propose to find order conserved subsequences of genomes by finding longest common exemplar subsequences of the genomes. The longest common exemplar subsequence problem is given by two genomes, asks to find a common exemplar subsequence of them, such that the exemplar subsequence length is maximized. We focus on genomes whose genes of the same gene family are in at most s spans. We propose a dynamic programming algorithm with time complexity O(s4 s mn) to find a longest common exemplar subsequence of two genomes with one genome admitting s span genes of the same gene family, where m, n stand for the gene numbers of those two given genomes. Our algorithm can be extended to find longest common exemplar subsequences of more than one genomes

Crossref

IUPUIScholarWorks

Exemplar Longest Common Subsequence (extended abstract)

Author: Bonizzoni Paola
Della Vedova Gianluca
Dondi Riccardo
Fertin Guillaume
Vialette Stéphane
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2006
Field of study

International audienceIn the paper we investigate the computational and approximation complexity of the Exemplar Longest Common Subsequence of a set of sequences (ELCS problem), a generalization of the Longest Common Subsequence problem, where the input sequences are over the union of two disjoint sets of symbols, a set of mandatory symbols and a set of optional symbols. We show that different versions of the problem are APX-hard even for instances with two sequences. Moreover, we show that the related problem of determining the existence of a feasible solution of the Exemplar Longest Common Subsequence of two sequences is NP-hard. On the positive side, efficient algorithms for the ELCS problem over instances of two sequences where each mandatory symbol can appear totally at most three times or the number of mandatory symbols is bounded by a constant are given

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Ecole des Ponts ParisTech

HAL-Rennes 1

The zero exemplar distance problem

Author: D. Sankoff
G. Blin
J.E. Hopcroft
M.R. Garey
P. Bonizzoni
R.G. Downey
S. Angibaud
S.S. Adi
V. Ferretti
Z. Chen
Z. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Given two genomes with duplicate genes, \textsc{Zero Exemplar Distance} is the problem of deciding whether the two genomes can be reduced to the same genome without duplicate genes by deleting all but one copy of each gene in each genome. Blin, Fertin, Sikora, and Vialette recently proved that \textsc{Zero Exemplar Distance} for monochromosomal genomes is NP-hard even if each gene appears at most two times in each genome, thereby settling an important open question on genome rearrangement in the exemplar model. In this paper, we give a very simple alternative proof of this result. We also study the problem \textsc{Zero Exemplar Distance} for multichromosomal genomes without gene order, and prove the analogous result that it is also NP-hard even if each gene appears at most two times in each genome. For the positive direction, we show that both variants of \textsc{Zero Exemplar Distance} admit polynomial-time algorithms if each gene appears exactly once in one genome and at least once in the other genome. In addition, we present a polynomial-time algorithm for the related problem \textsc{Exemplar Longest Common Subsequence} in the special case that each mandatory symbol appears exactly once in one input sequence and at least once in the other input sequence. This answers an open question of Bonizzoni et al. We also show that \textsc{Zero Exemplar Distance} for multichromosomal genomes without gene order is fixed-parameter tractable if the parameter is the maximum number of chromosomes in each genome.Comment: Strengthened and reorganize

arXiv.org e-Print Archive

CiteSeerX

Crossref

Variants of Constrained Longest Common Subsequence

Author: Adi
Alon
Apostolico
Apostolico
Apostolico
Arslan
Bergroth
Bonizzoni
Chin
Cormen
Downey
Fernandes
Gianluca Della Vedova
Gotthilf
Jiang
Maier
Paola Bonizzoni
Pietrzak
Riccardo Dondi
Räihä
Sankoff
Schmidt
Tsai
Yuri Pirola
Publication venue: 'Elsevier BV'
Publication date: 02/12/2009
Field of study

In this work, we consider a variant of the classical Longest Common Subsequence problem called Doubly-Constrained Longest Common Subsequence (DC-LCS). Given two strings s1 and s2 over an alphabet A, a set C_s of strings, and a function Co from A to N, the DC-LCS problem consists in finding the longest subsequence s of s1 and s2 such that s is a supersequence of all the strings in Cs and such that the number of occurrences in s of each symbol a in A is upper bounded by Co(a). The DC-LCS problem provides a clear mathematical formulation of a sequence comparison problem in Computational Biology and generalizes two other constrained variants of the LCS problem: the Constrained LCS and the Repetition-Free LCS. We present two results for the DC-LCS problem. First, we illustrate a fixed-parameter algorithm where the parameter is the length of the solution. Secondly, we prove a parameterized hardness result for the Constrained LCS problem when the parameter is the number of the constraint strings and the size of the alphabet A. This hardness result also implies the parameterized hardness of the DC-LCS problem (with the same parameters) and its NP-hardness when the size of the alphabet is constant

arXiv.org e-Print Archive

Crossref

Heuristic algorithms for the Longest Filled Common Subsequence Problem

Author: Mincu Radu Stefan
Popa Alexandru
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/04/2019
Field of study

At CPM 2017, Castelli et al. define and study a new variant of the Longest Common Subsequence Problem, termed the Longest Filled Common Subsequence Problem (LFCS). For the LFCS problem, the input consists of two strings

A

and

B

and a multiset of characters

\mathcal{M}

. The goal is to insert the characters from

\mathcal{M}

into the string

B

, thus obtaining a new string

B^*

, such that the Longest Common Subsequence (LCS) between

A

and

B^*

is maximized. Casteli et al. show that the problem is NP-hard and provide a 3/5-approximation algorithm for the problem. In this paper we study the problem from the experimental point of view. We introduce, implement and test new heuristic algorithms and compare them with the approximation algorithm of Casteli et al. Moreover, we introduce an Integer Linear Program (ILP) model for the problem and we use the state of the art ILP solver, Gurobi, to obtain exact solution for moderate sized instances.Comment: Accepted and presented as a proceedings paper at SYNASC 201

arXiv.org e-Print Archive

Crossref

Repetition-free longest common subsequence of random sequences

Author: Fernandes Cristina G.
Kiwi Marcos
Publication venue
Publication date: 21/05/2013
Field of study

A repetition free Longest Common Subsequence (LCS) of two sequences x and y is an LCS of x and y where each symbol may appear at most once. Let R denote the length of a repetition free LCS of two sequences of n symbols each one chosen randomly, uniformly, and independently over a k-ary alphabet. We study the asymptotic, in n and k, behavior of R and establish that there are three distinct regimes, depending on the relative speed of growth of n and k. For each regime we establish the limiting behavior of R. In fact, we do more, since we actually establish tail bounds for large deviations of R from its limiting behavior. Our study is motivated by the so called exemplar model proposed by Sankoff (1999) and the related similarity measure introduced by Adi et al. (2007). A natural question that arises in this context, which as we show is related to long standing open problems in the area of probabilistic combinatorics, is to understand the asymptotic, in n and k, behavior of parameter R.Comment: 15 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

The Extended Edit Distance Metric

Author: Fuad Muhammad Marwan Muhammad
Marteau Pierre-François
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/09/2007
Field of study

Similarity search is an important problem in information retrieval. This similarity is based on a distance. Symbolic representation of time series has attracted many researchers recently, since it reduces the dimensionality of these high dimensional data objects. We propose a new distance metric that is applied to symbolic data objects and we test it on time series data bases in a classification task. We compare it to other distances that are well known in the literature for symbolic data objects. We also prove, mathematically, that our distance is metric.Comment: Technical repor

arXiv.org e-Print Archive

Crossref

HAL Descartes

The Longest Filled Common Subsequence Problem

Author: Castelli Mauro
Dondi Riccardo
Mauri Giancarlo
Zoppis Italo
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 28th Annual Symposium on Combinatorial Pattern Matching (CPM 2017)
Publication date: 01/01/2017
Field of study

Inspired by a recent approach for genome reconstruction from incomplete data, we consider a variant of the longest common subsequence problem for the comparison of two sequences, one of which is incomplete, i.e. it has some missing elements. The new combinatorial problem, called Longest Filled Common Subsequence, given two sequences A and B, and a multiset M of symbols missing in B, asks for a sequence B* obtained by inserting the symbols of M into B so that B* induces a common subsequence with A of maximum length. First, we investigate the computational and approximation complexity of the problem and we show that it is NP-hard and APX-hard when A contains at most two occurrences of each symbol. Then, we give a 3/5 approximation algorithm for the problem. Finally, we present a fixed-parameter algorithm, when the problem is parameterized by the number of symbols inserted in B that "match" symbols of A

Repositório da Universidade Nova de Lisboa

Dagstuhl Research Online Publication Server

Efficient Tools for Computing the Number of Breakpoints and the Number of Adjacencies between two Genomes with Duplicate Genes

Author: Angibaud Sébastien
Fertin Guillaume
Rusu Irena
Thevenin Annelyse
Vialette Stéphane
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2008
Field of study

International audienceComparing genomes of different species is a fundamental problem in comparative genomics. Recent research has resulted in the introduction of different measures between pairs of genomes: reversal distance, number of breakpoints, number of common or conserved intervals, etc. However, classical methods used for computing such measures are seriously compromised when genomes have several copies of the same gene scattered across them. Most approaches to overcome this difficulty are based either on the exemplar model, which keeps exactly one copy in each genome of each duplicated gene, or on the maximum matching model, which keeps as many copies as possible of each duplicated gene. The goal is to find an exemplar matching, respectively a maximum matching, that optimizes the studied measure. Unfortunately, it turns out that, in presence of duplications, this problem for each above-mentioned measure is NP-hard. In this paper, we propose to compute the minimum number of breakpoints and the maximum number of adjacencies between two genomes in presence of duplications using two different approaches. The first one is a (exact) generic 0–1 linear programming approach, while the second is a collection of three heuristics. Each of these approaches is applied on each problem and for each of the following models: exemplar, maximum matching and intermediate model, that we introduce here. All these programs are run on a well-known public benchmark dataset of -Proteobacteria, and their performances are discussed

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

HAL-Rennes 1

Master Texture Space: An Efficient Encoding for Projectively Mapped Objects

Author: Guinnip David
Publication venue: UKnowledge
Publication date: 01/01/2005
Field of study

Projectively textured models are used in an increasingly large number of applicationsthat dynamically combine images with a simple geometric surface in a viewpoint dependentway. These models can provide visual fidelity while retaining the effects affordedby geometric approximation such as shadow casting and accurate perspective distortion.However, the number of stored views can be quite large and novel views must be synthesizedduring the rendering process because no single view may correctly texture the entireobject surface. This work introduces the Master Texture encoding and demonstrates thatthe encoding increases the utility of projectively textured objects by reducing render-timeoperations. Encoding involves three steps; 1) all image regions that correspond to the samegeometric mesh element are extracted and warped to a facet of uniform size and shape,2) an efficient packing of these facets into a new Master Texture image is computed, and3) the visibility of each pixel in the new Master Texture data is guaranteed using a simplealgorithm to discard occluded pixels in each view. Because the encoding implicitly representsthe multi-view geometry of the multiple images, a single texture mesh is sufficientto render the view-dependent model. More importantly, every Master Texture image cancorrectly texture the entire surface of the object, removing expensive computations suchas visibility analysis from the rendering algorithm. A benefit of this encoding is the supportfor pixel-wise view synthesis. The utility of pixel-wise view synthesis is demonstratedwith a real-time Master Texture encoded VDTM application. Pixel-wise synthesis is alsodemonstrated with an algorithm that distills a set of Master Texture images to a singleview-independent Master Texture image

University of Kentucky