Search CORE

7,502 research outputs found

Sublinear Space Algorithms for the Longest Common Substring Problem

Author: Kociumaka Tomasz
Starikovskaya Tatiana
Vildhøj Hjalte Wedel
Publication venue
Publication date: 01/01/2014
Field of study

Given

m

documents of total length

n

, we consider the problem of finding a longest string common to at least

d \geq 2

of the documents. This problem is known as the \emph{longest common substring (LCS) problem} and has a classic

O(n)

space and

O(n)

time solution (Weiner [FOCS'73], Hui [CPM'92]). However, the use of linear space is impractical in many applications. In this paper we show that for any trade-off parameter

1 \leq \tau \leq n

, the LCS problem can be solved in

O(\tau)

space and

O(n^2/\tau)

time, thus providing the first smooth deterministic time-space trade-off from constant to linear space. The result uses a new and very simple algorithm, which computes a

\tau

-additive approximation to the LCS in

O(n^2/\tau)

time and

O(1)

space. We also show a time-space trade-off lower bound for deterministic branching programs, which implies that any deterministic RAM algorithm solving the LCS problem on documents from a sufficiently large alphabet in

O(\tau)

space must use

\Omega(n\sqrt{\log(n/(\tau\log n))/\log\log(n/(\tau\log n)})

time.Comment: Accepted to 22nd European Symposium on Algorithm

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

The Longest Common Subsequence via Generalized Suffix Trees

Author: Afrin Tazin
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2015
Field of study

Given two strings S1 and S 2, finding the longest common subsequence (LCS) is a classical problem in computer science. Many algorithms have been proposed to find the longest common subsequence between two strings. The most common and widely used method is the dynamic programming approach, which runs in quadratic time and takes quadratic space. Other algorithms have been introduced later to solve the LCS problem in less time and space. In this work, we present a new algorithm to find the longest common subsequence using the generalized suffix tree and directed acyclic graph.;The Generalized suffix tree (GST) is the combined suffix tree for a set of strings {lcub}S1, S 2, ..., Sn{rcub}. Both the suffix tree and the generalized suffix tree can be calculated in linear time and linear space. One application for generalized suffix tree is to find the longest common substring between two strings. But finding the longest common subsequence is not straight forward using the generalized suffix tree. Here we describe how we can use the GST to find the common substrings between two strings and introduce a new approach to calculate the longest common subsequence (LCS) from the common substrings. This method takes a different view at the LCS problem, shading more light at novel applications of the LCS. We also show how this method can motivate the development of new compression techniques for genome resequencing data

The Research Repository @ WVU (West Virginia University)

A novel learning automata game with local feedback for parallel optimization of hydropower production

Author: Fidje Jahn Thomas
Haraldseid Christian Kråkevik
Publication venue: Universitetet i Agder ; University of Agder
Publication date: 01/01/2017
Field of study

Master's thesis Information- and communication technology IKT590 - University of Agder 2017Hydropower optimization for multi-reservoir systems is classi ed as a combinatorial optimization problem with large state-space that is particularly di cult to solve. There exist no golden standard when solving such problems, and many proposed algorithms are domain speci c. The literature describes several di erent techniques where linear programming approaches are extensively discussed, but tends to succumb to the curse of dimensionality problem when the state vector dimensions increase. This thesis introduces LA LCS, a novel learning automata algorithm that utilizes a parallel form of local feedback. This enables each individual automaton to receive direct feedback, resulting in faster convergence. In addition, the algorithm is implemented using a parallel architecture on a CUDA enabled GPU, along with exhaustive and random search. LA LCS has been veri ed through several scenarios. Experiments show that the algorithm is able to quickly adapt and nd optimal production strategies for problems of variable complexity. The algorithm is empirically veri ed and shown to hold great promise for solving optimization problems, including hydropower production strategies

Agder University Research Archive

Faster algorithms for longest common substring

Author: Charalampopoulos P. (Panagiotis)
Kociumaka T. (Tomasz)
Pissis S. (Solon)
Radoszewski J. (Jakub)
Publication venue
Publication date: 01/01/2021
Field of study

In the classic longest common substring (LCS) problem, we are given two strings S and T, each of length at most n, over an alphabet of size σ, and we are asked to find a longest string occurring as a fragment of both S and T. Weiner, in his seminal paper that introduced the suffix tree, presented an (n log σ)-time algorithm for this problem [SWAT 1973]. For polynomially-bounded integer alphabets, the linear-time construction of suffix trees by Farach yielded an (n)-time algorithm for the LCS problem [FOCS 1997]. However, for small alphabets, this is not necessarily optimal for the LCS problem in the word RAM model of computation, in which the strings can be stored in (n log σ/log n) space and read in (n log σ/log n) time. We show that, in this model, we can compute an LCS in time (n log σ / √{log n}), which is sublinear in n if σ = 2^{o(√{log n})} (in particular, if σ = (1)), using optimal space (n log σ/log n). We then lift our ideas to the problem of computing a k-mismatch LCS, which has received considerable attention in recent years. In this problem, the aim is to compute a longest substring of S that occurs in T with at most k mismatches. Flouri et al. showed how to compute a 1-mismatch LCS in (n log n) time [IPL 2015]. Thankachan et al. extended this result to computing a k-mismatch LCS in (n log^k n) time for k = (1) [J. Comput. Biol. 2016]. We show an (n log^{k-1/2} n)-time algorithm, for any constant integer k > 0 and irrespective of the alphabet size, using (n) space as the previous approaches. We thus notably break through the well-known n log^k n barrier, which stems from a recursive heavy-path decomposition technique that was first introduced in the seminal paper of Cole et al. [STOC 2004] for string indexing with k errors. </p

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

HAL Descartes

Dagstuhl Research Online Publication Server

Hal-Diderot

Faster Algorithms for Longest Common Substring

Author: Charalampopoulos Panagiotis
Kociumaka Tomasz
Pissis Solon,
Radoszewski Jakub
Publication venue: HAL CCSD
Publication date: 06/09/2021
Field of study

International audienceIn the classic longest common substring (LCS) problem, we are given two strings S and T , each of length at most n, over an alphabet of size σ, and we are asked to find a longest string occurring as a fragment of both S and T. Weiner, in his seminal paper that introduced the suffix tree, presented an O(n log σ)-time algorithm for this problem [SWAT 1973]. For polynomially-bounded integer alphabets, the linear-time construction of suffix trees by Farach yielded an O(n)-time algorithm for the LCS problem [FOCS 1997]. However, for small alphabets, this is not necessarily optimal for the LCS problem in the word RAM model of computation, in which the strings can be stored in O(n log σ/ log n) space and read in O(n log σ/ log n) time. We show that, in this model, we can compute an LCS in time O(n log σ/ √ log n), which is sublinear in n if σ = 2 o(√ log n) (in particular, if σ = O(1)), using optimal space O(n log σ/ log n). We then lift our ideas to the problem of computing a k-mismatch LCS, which has received considerable attention in recent years. In this problem, the aim is to compute a longest substring of S that occurs in T with at most k mismatches. Flouri et al. showed how to compute a 1-mismatch LCS in O(n log n) time [IPL 2015]. Thankachan et al. extended this result to computing a k-mismatch LCS in O(n log k n) time for k = O(1) [J. Comput. Biol. 2016]. We show an O(n log k−1/2 n)-time algorithm, for any constant k > 0 and irrespective of the alphabet size, using O(n) space as the previous approaches. We thus notably break through the well-known n log k n barrier, which stems from a recursive heavy-path decomposition technique that was first introduced in the seminal paper of Cole et al. [STOC 2004] for string indexing with k errors

INRIA a CCSD electronic archive server

Bounds on the Number of Longest Common Subsequences

Author: Greenberg Ronald I.
Publication venue
Publication date: 01/08/2003
Field of study

This paper performs the analysis necessary to bound the running time of known, efficient algorithms for generating all longest common subsequences. That is, we bound the running time as a function of input size for algorithms with time essentially proportional to the output size. This paper considers both the case of computing all distinct LCSs and the case of computing all LCS embeddings. Also included is an analysis of how much better the efficient algorithms are than the standard method of generating LCS embeddings. A full analysis is carried out with running times measured as a function of the total number of input characters, and much of the analysis is also provided for cases in which the two input sequences are of the same specified length or of two independently specified lengths.Comment: 13 pages. Corrected typos, corrected operation of hyperlinks, improved presentatio

arXiv.org e-Print Archive

Loyola eCommons