Search CORE

121 research outputs found

Complexity and Algorithms for the Discrete Fr\'echet Distance Upper Bound with Imprecise Input

Author: Fan Chenglin
Zhu Binhai
Publication venue
Publication date: 10/09/2015
Field of study

We study the problem of computing the upper bound of the discrete Fr\'{e}chet distance for imprecise input, and prove that the problem is NP-hard. This solves an open problem posed in 2010 by Ahn \emph{et al}. If shortcuts are allowed, we show that the upper bound of the discrete Fr\'{e}chet distance with shortcuts for imprecise input can be computed in polynomial time and we present several efficient algorithms.Comment: 15 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

The Tandem Duplication Distance Is NP-Hard

Author: Lafond Manuel
Zhu Binhai
Zou Peng
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)
Publication date: 12/06/2019
Field of study

In computational biology, tandem duplication is an important biological phenomenon which can occur either at the genome or at the DNA level. A tandem duplication takes a copy of a genome segment and inserts it right after the segment - this can be represented as the string operation AXB ? AXXB. Tandem exon duplications have been found in many species such as human, fly or worm, and have been largely studied in computational biology. The Tandem Duplication (TD) distance problem we investigate in this paper is defined as follows: given two strings S and T over the same alphabet, compute the smallest sequence of tandem duplications required to convert S to T. The natural question of whether the TD distance can be computed in polynomial time was posed in 2004 by Leupold et al. and had remained open, despite the fact that tandem duplications have received much attention ever since. In this paper, we prove that this problem is NP-hard, settling the 16-year old open problem. We further show that this hardness holds even if all characters of S are distinct. This is known as the exemplar TD distance, which is of special relevance in bioinformatics. One of the tools we develop for the reduction is a new problem called the Cost-Effective Subgraph, for which we obtain W[1]-hardness results that might be of independent interest. We finally show that computing the exemplar TD distance between S and T is fixed-parameter tractable. Our results open the door to many other questions, and we conclude with several open problems

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Genomic Problems Involving Copy Number Profiles: Complexity and Algorithms

Author: Lafond Manuel
Zhu Binhai
Zou Peng
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual Symposium on Combinatorial Pattern Matching (CPM 2020)
Publication date: 01/01/2020
Field of study

Recently, due to the genomic sequence analysis in several types of cancer, the genomic data based on {\em copy number profiles} ({\em CNP} for short) are getting more and more popular. A CNP is a vector where each component is a non-negative integer representing the number of copies of a specific gene or segment of interest. In this paper, we present two streams of results. The first is the negative results on two open problems regarding the computational complexity of the Minimum Copy Number Generation (MCNG) problem posed by Qingge et al. in 2018. It was shown by Qingge et al. that the problem is NP-hard if the duplications are tandem and they left the open question of whether the problem remains NP-hard if arbitrary duplications are used. We answer this question affirmatively in this paper; in fact, we prove that it is NP-hard to even obtain a constant factor approximation. We also prove that the parameterized version is W[1]-hard, answering another open question by Qingge et al. The other result is positive and is based on a new (and more general) problem regarding CNP's. The \emph{Copy Number Profile Conforming (CNPC)} problem is formally defined as follows: given two CNP's

C_1

and

C_2

, compute two strings

S_1

and

S_2

with

cnp(S_1)=C_1

and

cnp(S_2)=C_2

such that the distance between

S_1

and

S_2

d(S_1,S_2)

, is minimized. Here,

d(S_1,S_2)

is a very general term, which means it could be any genome rearrangement distance (like reversal, transposition, and tandem duplication, etc). We make the first step by showing that if

d(S_1,S_2)

is measured by the breakpoint distance then the problem is polynomially solvable.Comment: 16 pages, 3 figure

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

A 2-Approximation Algorithm for the Complementary Maximal Strip Recovery Problem

Author: Guo Jiong
Jiang Haitao
Zhu Binhai
Zhu Daming
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019)
Publication date: 01/01/2019
Field of study

The Maximal Strip Recovery problem (MSR) and its complementary (CMSR) are well-studied NP-hard problems in computational genomics. The input of these dual problems are two signed permutations. The goal is to delete some gene markers from both permutations, such that, in the remaining permutations, each gene marker has at least one common neighbor. Equivalently, the resulting permutations could be partitioned into common strips of length at least two. Then MSR is to maximize the number of remaining genes, while the objective of CMSR is to delete the minimum number of gene markers. In this paper, we present a new approximation algorithm for the Complementary Maximal Strip Recovery (CMSR) problem. Our approximation factor is 2, improving the currently best 7/3-approximation algorithm. Although the improvement on the factor is not huge, the analysis is greatly simplified by a compensating method, commonly referred to as the non-oblivious local search technique. In such a method a substitution may not always increase the value of the current solution (it sometimes may even decrease the solution value), though it always improves the value of another function seemingly unrelated to the objective function

Dagstuhl Research Online Publication Server