46,330 research outputs found

    Sublinear Space Algorithms for the Longest Common Substring Problem

    Full text link
    Given mm documents of total length nn, we consider the problem of finding a longest string common to at least d2d \geq 2 of the documents. This problem is known as the \emph{longest common substring (LCS) problem} and has a classic O(n)O(n) space and O(n)O(n) time solution (Weiner [FOCS'73], Hui [CPM'92]). However, the use of linear space is impractical in many applications. In this paper we show that for any trade-off parameter 1τn1 \leq \tau \leq n, the LCS problem can be solved in O(τ)O(\tau) space and O(n2/τ)O(n^2/\tau) time, thus providing the first smooth deterministic time-space trade-off from constant to linear space. The result uses a new and very simple algorithm, which computes a τ\tau-additive approximation to the LCS in O(n2/τ)O(n^2/\tau) time and O(1)O(1) space. We also show a time-space trade-off lower bound for deterministic branching programs, which implies that any deterministic RAM algorithm solving the LCS problem on documents from a sufficiently large alphabet in O(τ)O(\tau) space must use Ω(nlog(n/(τlogn))/loglog(n/(τlogn))\Omega(n\sqrt{\log(n/(\tau\log n))/\log\log(n/(\tau\log n)}) time.Comment: Accepted to 22nd European Symposium on Algorithm

    Efficient LZ78 factorization of grammar compressed text

    Full text link
    We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size nn representing a text SS of length NN, our algorithm computes the LZ78 factorization of TT in O(nN+mlogN)O(n\sqrt{N}+m\log N) time and O(nN+m)O(n\sqrt{N}+m) space, where mm is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the nNn\sqrt{N} term in the time and space complexities becomes either nLnL, where LL is the length of the longest LZ78 factor, or (Nα)(N - \alpha) where α0\alpha \geq 0 is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of SS of a certain length. Since m=O(N/logσN)m = O(N/\log_\sigma N) where σ\sigma is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when σ\sigma is constant, and can be more efficient when the text is compressible, i.e. when mm and nn are small.Comment: SPIRE 201

    BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction

    Get PDF
    A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN

    Lobsters with an almost perfect matching are graceful

    Full text link
    Let TT be a lobster with a matching that covers all but one vertex. We show that in this case, TT is graceful.Comment: 4 page

    Internal Pattern Matching Queries in a Text and Applications

    Full text link
    We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to queries about occurrences of one subword xx in another subword yy of a given text, assuming that y=O(x)|y|=\mathcal{O}(|x|), which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding δ\delta-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed δ\delta we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201

    Log-space Algorithms for Paths and Matchings in k-trees

    Get PDF
    Reachability and shortest path problems are NL-complete for general graphs. They are known to be in L for graphs of tree-width 2 [JT07]. However, for graphs of tree-width larger than 2, no bound better than NL is known. In this paper, we improve these bounds for k-trees, where k is a constant. In particular, the main results of our paper are log-space algorithms for reachability in directed k-trees, and for computation of shortest and longest paths in directed acyclic k-trees. Besides the path problems mentioned above, we also consider the problem of deciding whether a k-tree has a perfect macthing (decision version), and if so, finding a perfect match- ing (search version), and prove that these two problems are L-complete. These problems are known to be in P and in RNC for general graphs, and in SPL for planar bipartite graphs [DKR08]. Our results settle the complexity of these problems for the class of k-trees. The results are also applicable for bounded tree-width graphs, when a tree-decomposition is given as input. The technique central to our algorithms is a careful implementation of divide-and-conquer approach in log-space, along with some ideas from [JT07] and [LMR07].Comment: Accepted in STACS 201
    corecore