7,502 research outputs found
Sublinear Space Algorithms for the Longest Common Substring Problem
Given documents of total length , we consider the problem of finding a
longest string common to at least of the documents. This problem is
known as the \emph{longest common substring (LCS) problem} and has a classic
space and time solution (Weiner [FOCS'73], Hui [CPM'92]).
However, the use of linear space is impractical in many applications. In this
paper we show that for any trade-off parameter , the LCS
problem can be solved in space and time, thus providing
the first smooth deterministic time-space trade-off from constant to linear
space. The result uses a new and very simple algorithm, which computes a
-additive approximation to the LCS in time and
space. We also show a time-space trade-off lower bound for deterministic
branching programs, which implies that any deterministic RAM algorithm solving
the LCS problem on documents from a sufficiently large alphabet in
space must use
time.Comment: Accepted to 22nd European Symposium on Algorithm
The Longest Common Subsequence via Generalized Suffix Trees
Given two strings S1 and S 2, finding the longest common subsequence (LCS) is a classical problem in computer science. Many algorithms have been proposed to find the longest common subsequence between two strings. The most common and widely used method is the dynamic programming approach, which runs in quadratic time and takes quadratic space. Other algorithms have been introduced later to solve the LCS problem in less time and space. In this work, we present a new algorithm to find the longest common subsequence using the generalized suffix tree and directed acyclic graph.;The Generalized suffix tree (GST) is the combined suffix tree for a set of strings {lcub}S1, S 2, ..., Sn{rcub}. Both the suffix tree and the generalized suffix tree can be calculated in linear time and linear space. One application for generalized suffix tree is to find the longest common substring between two strings. But finding the longest common subsequence is not straight forward using the generalized suffix tree. Here we describe how we can use the GST to find the common substrings between two strings and introduce a new approach to calculate the longest common subsequence (LCS) from the common substrings. This method takes a different view at the LCS problem, shading more light at novel applications of the LCS. We also show how this method can motivate the development of new compression techniques for genome resequencing data
A novel learning automata game with local feedback for parallel optimization of hydropower production
Master's thesis Information- and communication technology IKT590 - University of Agder 2017Hydropower optimization for multi-reservoir systems is classi ed as a combinatorial
optimization problem with large state-space that is particularly
di cult to solve. There exist no golden standard when solving such problems,
and many proposed algorithms are domain speci c.
The literature describes several di erent techniques where linear programming
approaches are extensively discussed, but tends to succumb to the curse
of dimensionality problem when the state vector dimensions increase. This
thesis introduces LA LCS, a novel learning automata algorithm that utilizes
a parallel form of local feedback. This enables each individual automaton
to receive direct feedback, resulting in faster convergence. In addition, the
algorithm is implemented using a parallel architecture on a CUDA enabled
GPU, along with exhaustive and random search.
LA LCS has been veri ed through several scenarios. Experiments show that
the algorithm is able to quickly adapt and nd optimal production strategies
for problems of variable complexity. The algorithm is empirically veri ed
and shown to hold great promise for solving optimization problems, including
hydropower production strategies
Faster algorithms for longest common substring
In the classic longest common substring (LCS) problem, we are given two strings S and T, each of length at most n, over an alphabet of size σ, and we are asked to find a longest string occurring as a fragment of both S and T. Weiner, in his seminal paper that introduced the suffix tree, presented an (n log σ)-time algorithm for this problem [SWAT 1973]. For polynomially-bounded integer alphabets, the linear-time construction of suffix trees by Farach yielded an (n)-time algorithm for the LCS problem [FOCS 1997]. However, for small alphabets, this is not necessarily optimal for the LCS problem in the word RAM model of computation, in which the strings can be stored in (n log σ/log n) space and read in (n log σ/log n) time. We show that, in this model, we can compute an LCS in time (n log σ / √{log n}), which is sublinear in n if σ = 2^{o(√{log n})} (in particular, if σ = (1)), using optimal space (n log σ/log n).
We then lift our ideas to the problem of computing a k-mismatch LCS, which has received considerable attention in recent years. In this problem, the aim is to compute a longest substring of S that occurs in T with at most k mismatches. Flouri et al. showed how to compute a 1-mismatch LCS in (n log n) time [IPL 2015]. Thankachan et al. extended this result to computing a k-mismatch LCS in (n log^k n) time for k = (1) [J. Comput. Biol. 2016]. We show an (n log^{k-1/2} n)-time algorithm, for any constant integer k > 0 and irrespective of the alphabet size, using (n) space as the previous approaches. We thus notably break through the well-known n log^k n barrier, which stems from a recursive heavy-path decomposition technique that was first introduced in the seminal paper of Cole et al. [STOC 2004] for string indexing with k errors. </p
Faster Algorithms for Longest Common Substring
International audienceIn the classic longest common substring (LCS) problem, we are given two strings S and T , each of length at most n, over an alphabet of size σ, and we are asked to find a longest string occurring as a fragment of both S and T. Weiner, in his seminal paper that introduced the suffix tree, presented an O(n log σ)-time algorithm for this problem [SWAT 1973]. For polynomially-bounded integer alphabets, the linear-time construction of suffix trees by Farach yielded an O(n)-time algorithm for the LCS problem [FOCS 1997]. However, for small alphabets, this is not necessarily optimal for the LCS problem in the word RAM model of computation, in which the strings can be stored in O(n log σ/ log n) space and read in O(n log σ/ log n) time. We show that, in this model, we can compute an LCS in time O(n log σ/ √ log n), which is sublinear in n if σ = 2 o(√ log n) (in particular, if σ = O(1)), using optimal space O(n log σ/ log n). We then lift our ideas to the problem of computing a k-mismatch LCS, which has received considerable attention in recent years. In this problem, the aim is to compute a longest substring of S that occurs in T with at most k mismatches. Flouri et al. showed how to compute a 1-mismatch LCS in O(n log n) time [IPL 2015]. Thankachan et al. extended this result to computing a k-mismatch LCS in O(n log k n) time for k = O(1) [J. Comput. Biol. 2016]. We show an O(n log k−1/2 n)-time algorithm, for any constant k > 0 and irrespective of the alphabet size, using O(n) space as the previous approaches. We thus notably break through the well-known n log k n barrier, which stems from a recursive heavy-path decomposition technique that was first introduced in the seminal paper of Cole et al. [STOC 2004] for string indexing with k errors
Bounds on the Number of Longest Common Subsequences
This paper performs the analysis necessary to bound the running time of
known, efficient algorithms for generating all longest common subsequences.
That is, we bound the running time as a function of input size for algorithms
with time essentially proportional to the output size. This paper considers
both the case of computing all distinct LCSs and the case of computing all LCS
embeddings. Also included is an analysis of how much better the efficient
algorithms are than the standard method of generating LCS embeddings. A full
analysis is carried out with running times measured as a function of the total
number of input characters, and much of the analysis is also provided for cases
in which the two input sequences are of the same specified length or of two
independently specified lengths.Comment: 13 pages. Corrected typos, corrected operation of hyperlinks,
improved presentatio
- …