Search CORE

25 research outputs found

Tight Conditional Lower Bounds for Longest Common Increasing Subsequence

Author: Duraj Lech
Polak Adam
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 12th International Symposium on Parameterized and Exact Computation (IPEC 2017)
Publication date: 01/01/2017
Field of study

We consider the canonical generalization of the well-studied Longest Increasing Subsequence problem to multiple sequences, called k-LCIS: Given k integer sequences X_1,...,X_k of length at most n, the task is to determine the length of the longest common subsequence of X_1,...,X_k that is also strictly increasing. Especially for the case of k=2 (called LCIS for short), several algorithms have been proposed that require quadratic time in the worst case. Assuming the Strong Exponential Time Hypothesis (SETH), we prove a tight lower bound, specifically, that no algorithm solves LCIS in (strongly) subquadratic time. Interestingly, the proof makes no use of normalization tricks common to hardness proofs for similar problems such as LCS. We further strengthen this lower bound to rule out O((nL)^{1-epsilon}) time algorithms for LCIS, where L denotes the solution size, and to rule out O(n^{k-epsilon}) time algorithms for k-LCIS. We obtain the same conditional lower bounds for the related Longest Common Weakly Increasing Subsequence problem

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

Dagstuhl Research Online Publication Server

Jagiellonian Univeristy Repository

MPG.PuRe

Sketching, Streaming, and Fine-Grained Complexity of (Weighted) LCS

Author: Bringmann Karl
Chaudhury Bhaskar Ray
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 38th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS 2018)
Publication date: 01/01/2018
Field of study

We study sketching and streaming algorithms for the Longest Common Subsequence problem (LCS) on strings of small alphabet size |Sigma|. For the problem of deciding whether the LCS of strings x,y has length at least L, we obtain a sketch size and streaming space usage of O(L^{|Sigma| - 1} log L). We also prove matching unconditional lower bounds. As an application, we study a variant of LCS where each alphabet symbol is equipped with a weight that is given as input, and the task is to compute a common subsequence of maximum total weight. Using our sketching algorithm, we obtain an O(min{nm, n + m^{|Sigma|}})-time algorithm for this problem, on strings x,y of length n,m, with n >= m. We prove optimality of this running time up to lower order factors, assuming the Strong Exponential Time Hypothesis

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

MPG.PuRe

Multivariate Fine-Grained Complexity of Longest Common Subsequence

Author: Bringmann Karl
Künnemann Marvin
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2018
Field of study

We revisit the classic combinatorial pattern matching problem of finding a longest common subsequence (LCS). For strings

x

and

y

of length

n

, a textbook algorithm solves LCS in time

O(n^2)

, but although much effort has been spent, no

O(n^{2-\varepsilon})

-time algorithm is known. Recent work indeed shows that such an algorithm would refute the Strong Exponential Time Hypothesis (SETH) [Abboud, Backurs, Vassilevska Williams + Bringmann, K\"unnemann FOCS'15]. Despite the quadratic-time barrier, for over 40 years an enduring scientific interest continued to produce fast algorithms for LCS and its variations. Particular attention was put into identifying and exploiting input parameters that yield strongly subquadratic time algorithms for special cases of interest, e.g., differential file comparison. This line of research was successfully pursued until 1990, at which time significant improvements came to a halt. In this paper, using the lens of fine-grained complexity, our goal is to (1) justify the lack of further improvements and (2) determine whether some special cases of LCS admit faster algorithms than currently known. To this end, we provide a systematic study of the multivariate complexity of LCS, taking into account all parameters previously discussed in the literature: the input size

n:=\max\{|x|,|y|\}

, the length of the shorter string

m:=\min\{|x|,|y|\}

, the length

L

of an LCS of

x

and

y

, the numbers of deletions

\delta := m-L

and

\Delta := n-L

, the alphabet size, as well as the numbers of matching pairs

M

and dominant pairs

d

. For any class of instances defined by fixing each parameter individually to a polynomial in terms of the input size, we prove a SETH-based lower bound matching one of three known algorithms. Specifically, we determine the optimal running time for LCS under SETH as

(n+\min\{d, \delta \Delta, \delta m\})^{1\pm o(1)}

. [...]Comment: Presented at SODA'18. Full Version. 66 page

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Automatic coding of medical problem lists

Author: Powsner Seth M.
Publication venue: EliScholar – A Digital Platform for Scholarly Publishing at Yale
Publication date: 01/01/1978
Field of study

Yale University

Exact algorithms for the repetition-bounded longest common subsequence problem

Author: Asahiro Yuichi
Jansson Jesper
Lin Guohui
Miyano Eiji
Ono Hirotaka
Utashima Tadatoshi
Publication venue: 'Elsevier BV'
Publication date: 04/08/2020
Field of study

In this paper, we study exact, exponential-time algorithms for a variant of the classic Longest Common Subsequence problem called the Repetition-Bounded Longest Common Subsequence problem (or RBLCS, for short): Let an alphabet S be a finite set of symbols and an occurrence constraint Cocc be a function Cocc: S → N, assigning an upper bound on the number of occurrences of each symbol in S. Given two sequences X and Y over the alphabet S and an occurrence constraint Cocc, the goal of RBLCS is to find a longest common subsequence of X and Y such that each symbol s ∈ S appears at most Cocc(s) times in the obtained subsequence. The special case where Cocc(s) = 1 for every symbol s ∈ S is known as the Repetition-Free Longest Common Subsequence problem (RFLCS) and has been studied previously; e.g., in [1], Adi et al. presented a simple (exponential-time) exact algorithm for RFLCS. However, they did not analyze its time complexity in detail, and to the best of our knowledge, there are no previous results on the running times of any exact algorithms for this problem. Without loss of generality, we will assume that |X| ≤ |Y | and |X| = n. In this paper, we first propose a simpler algorithm for RFLCS based on the strategy used in [1] and show explicitly that its running time is O(1.44225n). Next, we provide a dynamic programming (DP) based algorithm for RBLCS and prove that its running time is O(1.44225n) for any occurrence constraint Cocc, and even less in certain special cases. In particular, for RFLCS, our DP-based algorithm runs in O(1.41422n) time, which is faster than the previous one. Furthermore, we prove NP-hardness and APX-hardness results for RBLCS on restricted instances

Kyutacar : Kyushu Institute of Technology Academic Repository

A Descriptive Tolerance Nearness Measure for Performing Graph Comparison

Author: Awais Syed Aqeel
Henry Christopher J.
Publication venue: 'IOS Press'
Publication date: 03/11/2018
Field of study

Accepted versionThis article proposes the tolerance nearness measure (TNM) as a computationally reduced alternative to the graph edit distance (GED) for performing graph comparisons. The TNM is defined within the context of near set theory, where the central idea is that determining similarity between sets of disjoint objects is at once intuitive and practically applicable. The TNM between two graphs is produced using the Bron-Kerbosh maximal clique enumeration algorithm. The result is that the TNM approach is less computationally complex than the bipartite-based GED algorithm. The contribution of this paper is the application of TNM to the problem of quantifying the similarity of disjoint graphs and that the maximal clique enumeration-based TNM produces comparable results to the GED when applied to the problem of content-based image processing, which becomes important as the number of nodes in a graph increases."This research was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant 418413."https://content.iospress.com/articles/fundamenta-informaticae/fi174

WinnSpace Repository

User error analysis and automatic correction for compiling

Author
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date
Field of study

Crossref

Spelling correction in the NLP system 'LOLITA: dictionary organisation and search algorithms

Author: Parker Brett Stephen
Publication venue
Publication date: 01/01/1994
Field of study

This thesis describes the design and implementation of a spelling correction system and associated dictionaries, for the Natural Language Processing System 'LOLITA'. The dictionary storage is based upon a trie (M-ary tree) data-structure. The design of the dictionary is described, and the way in which the data-structure is implemented is also discussed. The spelling correction system makes use of the trie structure in order to limit repetition and "garden path' searching. The spelling correction algorithms used are a variation on the 'reverse minimum edit-distance' technique. These algorithms have been modified in order to place more emphasis on generation in order of likelihood. The system will correct up to two simple errors {i.e. insertion, omission, substitution or transposition of characters) per word. The individual algorithms are presented in turn and their combination into a unified strategy to correct misspellings is demonstrated. The system was implemented in the programming language Haskell; a pure functional, class-based language, with non-strict semantics and polymorphic type-checking. The use of several features of this language, in particular lazy evaluation, and their corresponding advantages over more traditional languages are described. The dictionaries and spelling correcting facilities are in use in the LOLITA system. Issues pertaining to 'real word' error correction, arising from the system's use in an NLP context, axe also discussed

Durham e-Theses

Multivariate Fine-Grained Complexity of Longest Common Subsequence

Author: Bringmann K.
Künnemann M.
Publication venue
Publication date: 01/01/2018
Field of study

We revisit the classic combinatorial pattern matching problem of finding a longest common subsequence (LCS). For strings

x

and

y

of length

n

, a textbook algorithm solves LCS in time

O(n^2)

, but although much effort has been spent, no

O(n^{2-\varepsilon})

n:=\max\{|x|,|y|\}

, the length of the shorter string

m:=\min\{|x|,|y|\}

, the length

L

of an LCS of

x

and

y

, the numbers of deletions

\delta := m-L

and

\Delta := n-L

, the alphabet size, as well as the numbers of matching pairs

M

and dominant pairs

d

(n+\min\{d, \delta \Delta, \delta m\})^{1\pm o(1)}

. [...

MPG.PuRe