46,330 research outputs found
Sublinear Space Algorithms for the Longest Common Substring Problem
Given documents of total length , we consider the problem of finding a
longest string common to at least of the documents. This problem is
known as the \emph{longest common substring (LCS) problem} and has a classic
space and time solution (Weiner [FOCS'73], Hui [CPM'92]).
However, the use of linear space is impractical in many applications. In this
paper we show that for any trade-off parameter , the LCS
problem can be solved in space and time, thus providing
the first smooth deterministic time-space trade-off from constant to linear
space. The result uses a new and very simple algorithm, which computes a
-additive approximation to the LCS in time and
space. We also show a time-space trade-off lower bound for deterministic
branching programs, which implies that any deterministic RAM algorithm solving
the LCS problem on documents from a sufficiently large alphabet in
space must use
time.Comment: Accepted to 22nd European Symposium on Algorithm
Efficient LZ78 factorization of grammar compressed text
We present an efficient algorithm for computing the LZ78 factorization of a
text, where the text is represented as a straight line program (SLP), which is
a context free grammar in the Chomsky normal form that generates a single
string. Given an SLP of size representing a text of length , our
algorithm computes the LZ78 factorization of in time
and space, where is the number of resulting LZ78 factors.
We also show how to improve the algorithm so that the term in the
time and space complexities becomes either , where is the length of the
longest LZ78 factor, or where is a quantity
which depends on the amount of redundancy that the SLP captures with respect to
substrings of of a certain length. Since where
is the alphabet size, the latter is asymptotically at least as fast as
a linear time algorithm which runs on the uncompressed string when is
constant, and can be more efficient when the text is compressible, i.e. when
and are small.Comment: SPIRE 201
BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction
A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN
Lobsters with an almost perfect matching are graceful
Let be a lobster with a matching that covers all but one vertex. We show
that in this case, is graceful.Comment: 4 page
Internal Pattern Matching Queries in a Text and Applications
We consider several types of internal queries: questions about subwords of a
text. As the main tool we develop an optimal data structure for the problem
called here internal pattern matching. This data structure provides
constant-time answers to queries about occurrences of one subword in
another subword of a given text, assuming that ,
which allows for a constant-space representation of all occurrences. This
problem can be viewed as a natural extension of the well-studied pattern
matching problem. The data structure has linear size and admits a linear-time
construction algorithm.
Using the solution to the internal pattern matching problem, we obtain very
efficient data structures answering queries about: primitivity of subwords,
periods of subwords, general substring compression, and cyclic equivalence of
two subwords. All these results improve upon the best previously known
counterparts. The linear construction time of our data structure also allows to
improve the algorithm for finding -subrepetitions in a text (a more
general version of maximal repetitions, also called runs). For any fixed
we obtain the first linear-time algorithm, which matches the linear
time complexity of the algorithm computing runs. Our data structure has already
been used as a part of the efficient solutions for subword suffix rank &
selection, as well as substring compression using Burrows-Wheeler transform
composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201
Log-space Algorithms for Paths and Matchings in k-trees
Reachability and shortest path problems are NL-complete for general graphs.
They are known to be in L for graphs of tree-width 2 [JT07]. However, for
graphs of tree-width larger than 2, no bound better than NL is known. In this
paper, we improve these bounds for k-trees, where k is a constant. In
particular, the main results of our paper are log-space algorithms for
reachability in directed k-trees, and for computation of shortest and longest
paths in directed acyclic k-trees.
Besides the path problems mentioned above, we also consider the problem of
deciding whether a k-tree has a perfect macthing (decision version), and if so,
finding a perfect match- ing (search version), and prove that these two
problems are L-complete. These problems are known to be in P and in RNC for
general graphs, and in SPL for planar bipartite graphs [DKR08].
Our results settle the complexity of these problems for the class of k-trees.
The results are also applicable for bounded tree-width graphs, when a
tree-decomposition is given as input. The technique central to our algorithms
is a careful implementation of divide-and-conquer approach in log-space, along
with some ideas from [JT07] and [LMR07].Comment: Accepted in STACS 201
Recommended from our members
A survey of induction algorithms for machine learning
Central to all systems for machine learning from examples is an induction algorithm. The purpose of the algorithm is to generalize from a finite set of training examples a description consistent with the examples seen, and, hopefully, with the potentially infinite set of examples not seen. This paper surveys four machine learning induction algorithms. The knowledge representation schemes and a PDL description of algorithm control are emphasized. System characteristics that are peculiar to a domain of application are de-emphasized. Finally, a comparative summary of the learning algorithms is presented
- …