    Sketching, Streaming, and Fine-Grained Complexity of (Weighted) LCS

    We study sketching and streaming algorithms for the Longest Common Subsequence problem (LCS) on strings of small alphabet size |Sigma|. For the problem of deciding whether the LCS of strings x,y has length at least L, we obtain a sketch size and streaming space usage of O(L^{|Sigma| - 1} log L). We also prove matching unconditional lower bounds. As an application, we study a variant of LCS where each alphabet symbol is equipped with a weight that is given as input, and the task is to compute a common subsequence of maximum total weight. Using our sketching algorithm, we obtain an O(min{nm, n + m^{|Sigma|}})-time algorithm for this problem, on strings x,y of length n,m, with n >= m. We prove optimality of this running time up to lower order factors, assuming the Strong Exponential Time Hypothesis

    Davenport constant with weights

    For the cyclic group G=Z/nZG=\mathbb{Z}/n\mathbb{Z} and any non-empty A∈ZA\in\mathbb{Z}. We define the Davenport constant of GG with weight AA, denoted by DA(n)D_A(n), to be the least natural number kk such that for any sequence (x1,...,xk)(x_1, ..., x_k) with xi∈Gx_i\in G, there exists a non-empty subsequence (xj1,...,xjl)(x_{j_1}, ..., x_{j_l}) and a1,...,al∈Aa_1, ..., a_l\in A such that ∑i=1laixji=0\sum_{i=1}^l a_ix_{j_i} = 0. Similarly, we define the constant EA(n)E_A(n) to be the least t∈Nt\in\mathbb{N} such that for all sequences (x1,>...,xt)(x_1, >..., x_t) with xi∈Gx_i \in G, there exist indices j1,...,jn∈N,1≤j1<...<jn≤tj_1, ..., j_n\in\mathbb{N}, 1\leq j_1 <... < j_n\leq t, and ϑ1,>...,ϑn∈A\vartheta_1, >..., \vartheta_n\in A with ∑i=1nϑixji=0\sum^{n}_{i=1} \vartheta_ix_{j_i} = 0. In the present paper, we show that EA(n)=DA(n)+n−1E_A(n)=D_A(n)+n-1. This solve the problem raised by Adhikari and Rath \cite{ar06}, Adhikari and Chen \cite{ac08}, Thangadurai \cite{th07} and Griffiths \cite{gr08}.Comment: 6page

    Faster optimal univariate microgaggregation

    Microaggregation is a method to coarsen a dataset, by optimally clustering data points in groups of at least kk points, thereby providing a kk-anonymity type disclosure guarantee for each point in the dataset. Previous algorithms for univariate microaggregation had a O(kn)O(k n) time complexity. By rephrasing microaggregation as an instance of the concave least weight subsequence problem, in this work we provide improved algorithms that provide an optimal univariate microaggregation on sorted data in O(n)O(n) time and space. We further show that our algorithms work not only for sum of squares cost functions, as typically considered, but seamlessly extend to many other cost functions used for univariate microaggregation tasks. In experiments we show that the presented algorithms lead to real world performance improvements

    On the Fine-Grained Complexity of One-Dimensional Dynamic Programming

    In this paper, we investigate the complexity of one-dimensional dynamic programming, or more specifically, of the Least-Weight Subsequence (LWS) problem: Given a sequence of n data items together with weights for every pair of the items, the task is to determine a subsequence S minimizing the total weight of the pairs adjacent in S. A large number of natural problems can be formulated as LWS problems, yielding obvious O(n^2)-time solutions. In many interesting instances, the O(n^2)-many weights can be succinctly represented. Yet except for near-linear time algorithms for some specific special cases, little is known about when an LWS instantiation admits a subquadratic-time algorithm and when it does not. In particular, no lower bounds for LWS instantiations have been known before. In an attempt to remedy this situation, we provide a general approach to study the fine-grained complexity of succinct instantiations of the LWS problem: Given an LWS instantiation we identify a highly parallel core problem that is subquadratically equivalent. This provides either an explanation for the apparent hardness of the problem or an avenue to find improved algorithms as the case may be. More specifically, we prove subquadratic equivalences between the following pairs (an LWS instantiation and the corresponding core problem) of problems: a low-rank version of LWS and minimum inner product, finding the longest chain of nested boxes and vector domination, and a coin change problem which is closely related to the knapsack problem and (min,+)-convolution. Using these equivalences and known SETH-hardness results for some of the core problems, we deduce tight conditional lower bounds for the corresponding LWS instantiations. We also establish the (min,+)-convolution-hardness of the knapsack problem. Furthermore, we revisit some of the LWS instantiations which are known to be solvable in near-linear time and explain their easiness in terms of the easiness of the corresponding core problems

    The zero exemplar distance problem

    Given two genomes with duplicate genes, \textsc{Zero Exemplar Distance} is the problem of deciding whether the two genomes can be reduced to the same genome without duplicate genes by deleting all but one copy of each gene in each genome. Blin, Fertin, Sikora, and Vialette recently proved that \textsc{Zero Exemplar Distance} for monochromosomal genomes is NP-hard even if each gene appears at most two times in each genome, thereby settling an important open question on genome rearrangement in the exemplar model. In this paper, we give a very simple alternative proof of this result. We also study the problem \textsc{Zero Exemplar Distance} for multichromosomal genomes without gene order, and prove the analogous result that it is also NP-hard even if each gene appears at most two times in each genome. For the positive direction, we show that both variants of \textsc{Zero Exemplar Distance} admit polynomial-time algorithms if each gene appears exactly once in one genome and at least once in the other genome. In addition, we present a polynomial-time algorithm for the related problem \textsc{Exemplar Longest Common Subsequence} in the special case that each mandatory symbol appears exactly once in one input sequence and at least once in the other input sequence. This answers an open question of Bonizzoni et al. We also show that \textsc{Zero Exemplar Distance} for multichromosomal genomes without gene order is fixed-parameter tractable if the parameter is the maximum number of chromosomes in each genome.Comment: Strengthened and reorganize

    Why is it hard to beat O(n2)O(n^2) for Longest Common Weakly Increasing Subsequence?

    The Longest Common Weakly Increasing Subsequence problem (LCWIS) is a variant of the classic Longest Common Subsequence problem (LCS). Both problems can be solved with simple quadratic time algorithms. A recent line of research led to a number of matching conditional lower bounds for LCS and other related problems. However, the status of LCWIS remained open. In this paper we show that LCWIS cannot be solved in strongly subquadratic time unless the Strong Exponential Time Hypothesis (SETH) is false. The ideas which we developed can also be used to obtain a lower bound based on a safer assumption of NC-SETH, i.e. a version of SETH which talks about NC circuits instead of less expressive CNF formulas

    Viterbi Sequences and Polytopes

    A Viterbi path of length n of a discrete Markov chain is a sequence of n+1 states that has the greatest probability of ocurring in the Markov chain. We divide the space of all Markov chains into Viterbi regions in which two Markov chains are in the same region if they have the same set of Viterbi paths. The Viterbi paths of regions of positive measure are called Viterbi sequences. Our main results are (1) each Viterbi sequence can be divided into a prefix, periodic interior, and suffix, and (2) as n increases to infinity (and the number of states remains fixed), the number of Viterbi regions remains bounded. The Viterbi regions correspond to the vertices of a Newton polytope of a polynomial whose terms are the probabilities of sequences of length n. We characterize Viterbi sequences and polytopes for two- and three-state Markov chains.Comment: 15 pages, 2 figures, to appear in Journal of Symbolic Computatio
