Search CORE

362 research outputs found

Subsequence Automata with Default Transitions

Author: Bille Philip
Gørtz Inge Li
Skjoldjensen Frederik Rye
Publication venue
Publication date: 01/01/2016
Field of study

Let

S

be a string of length

n

with characters from an alphabet of size

\sigma

. The \emph{subsequence automaton} of

S

(often called the \emph{directed acyclic subsequence graph}) is the minimal deterministic finite automaton accepting all subsequences of

S

. A straightforward construction shows that the size (number of states and transitions) of the subsequence automaton is

O(n\sigma)

and that this bound is asymptotically optimal. In this paper, we consider subsequence automata with \emph{default transitions}, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the \emph{delay}, i.e., the maximum number of consecutive default transitions followed before consuming a character. Specifically, given any integer parameter

k

1 < k \leq \sigma

, we present a subsequence automaton with default transitions of size

O(nk\log_{k}\sigma)

and delay

O(\log_k \sigma)

. Hence, with

k = 2

we obtain an automaton of size

O(n \log \sigma)

and delay

O(\log \sigma)

. On the other extreme, with

k = \sigma

, we obtain an automaton of size

O(n \sigma)

and delay

O(1)

, thus matching the bound for the standard subsequence automaton construction. Finally, we generalize the result to multiple strings. The key component of our result is a novel hierarchical automata construction of independent interest.Comment: Corrected typo

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Combinatorial Algorithms for Subsequence Matching: A Survey

Author: Kosche Maria
Koß Tore
Manea Florin
Siemer Stefan
Publication venue
Publication date: 10/10/2022
Field of study

In this paper we provide an overview of a series of recent results regarding algorithms for searching for subsequences in words or for the analysis of the sets of subsequences occurring in a word.Comment: This is a revised version of the paper with the same title which appeared in the Proceedings of NCMA 2022, EPTCS 367, 2022, pp. 11-27 (DOI: 10.4204/EPTCS.367.2). The revision consists in citing a series of relevant references which were not covered in the initial version, and commenting on how they relate to the results we survey. arXiv admin note: text overlap with arXiv:2206.1389

arXiv.org e-Print Archive

Longest Common Subsequence with Gap Constraints

Author: Adamson Duncan
Kosche Maria
Koß Tore
Manea Florin
Siemer Stefan
Publication venue
Publication date: 11/04/2023
Field of study

We consider the longest common subsequence problem in the context of subsequences with gap constraints. In particular, following Day et al. 2022, we consider the setting when the distance (i. e., the gap) between two consecutive symbols of the subsequence has to be between a lower and an upper bound (which may depend on the position of those symbols in the subsequence or on the symbols bordering the gap) as well as the case where the entire subsequence is found in a bounded range (defined by a single upper bound), considered by Kosche et al. 2022. In all these cases, we present effcient algorithms for determining the length of the longest common constrained subsequence between two given strings

arXiv.org e-Print Archive

Longest common parameterized subsequences with fixed common substring

Author: Gorbenko A.
Popov V.
Publication venue
Publication date: 01/01/2013
Field of study

In this paper we consider the problem of the longest common parameterized subsequence with fixed common substring (STR-IC-LCPS). in particular, we show that STR-IC-LCPS is NP-complete. We describe an approach to solve STR-IC-LCPS. This approach is based on an explicit reduction from the problem to the satisfiability problem

Institutional repository of Ural Federal University named after the first President of Russia B.N.Yeltsin

Regular expression constrained sequence alignment revisited

Author: Kucherov Gregory
Pinhas Tamar
Ziv-Ukelson Michal
Publication venue: 'Mary Ann Liebert Inc'
Publication date: 01/01/2011
Field of study

International audienceImposing constraints in the form of a finite automaton or a regular expression is an effective way to incorporate additional a priori knowledge into sequence alignment procedures. With this motivation, the Regular Expression Constrained Sequence Alignment Problem was introduced, which proposed an O(n^2t^4) time and O(n^2t^2) space algorithm for solving it, where n is the length of the input strings and t is the number of states in the input non-deterministic automaton. A faster O(n^2t^3) time algorithm for the same problem was subsequently proposed. In this article, we further speed up the algorithms for Regular Language Constrained Sequence Alignment by reducing their worst case time complexity bound to O(n^2t^3/log t). This is done by establishing an optimal bound on the size of Straight-Line Programs solving the maxima computation subproblem of the basic dynamic programming algorithm. We also study another solution based on a Steiner Tree computation. While it does not improve worst case, our simulations show that both approaches are efficient in practice, especially when the input automata are dense

HAL - Lille 3

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Matching and Compression of Strings with Automata and Word Packing

Author: Skjoldjensen Frederik Rye
Publication venue: DTU Compute
Publication date: 01/01/2017
Field of study

Online Research Database In Technology

Subsequences with Gap Constraints: Complexity Bounds for Matching and Analysis Problems

Author: Day Joel D.
Kosche Maria
Manea Florin
Schmid Markus L.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Algorithms and Computation (ISAAC 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server

Source Coding for Quasiarithmetic Penalties

Author: Baer Michael B.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Huffman coding finds a prefix code that minimizes mean codeword length for a given probability distribution over a finite number of items. Campbell generalized the Huffman problem to a family of problems in which the goal is to minimize not mean codeword length but rather a generalized mean known as a quasiarithmetic or quasilinear mean. Such generalized means have a number of diverse applications, including applications in queueing. Several quasiarithmetic-mean problems have novel simple redundancy bounds in terms of a generalized entropy. A related property involves the existence of optimal codes: For ``well-behaved'' cost functions, optimal codes always exist for (possibly infinite-alphabet) sources having finite generalized entropy. Solving finite instances of such problems is done by generalizing an algorithm for finding length-limited binary codes to a new algorithm for finding optimal binary codes for any quasiarithmetic mean with a convex cost function. This algorithm can be performed using quadratic time and linear space, and can be extended to other penalty functions, some of which are solvable with similar space and time complexity, and others of which are solvable with slightly greater complexity. This reduces the computational complexity of a problem involving minimum delay in a queue, allows combinations of previously considered problems to be optimized, and greatly expands the space of problems solvable in quadratic time and linear space. The algorithm can be extended for purposes such as breaking ties among possibly different optimal codes, as with bottom-merge Huffman coding.Comment: 22 pages, 3 figures, submitted to IEEE Trans. Inform. Theory, revised per suggestions of reader

arXiv.org e-Print Archive

CiteSeerX

Crossref

Acta Cybernetica : Volume 17. Number 4.

Author
Publication venue
Publication date: 01/01/2006
Field of study

University of Szeged