Search CORE

46,330 research outputs found

Sublinear Space Algorithms for the Longest Common Substring Problem

Author: Kociumaka Tomasz
Starikovskaya Tatiana
Vildhøj Hjalte Wedel
Publication venue
Publication date: 01/01/2014
Field of study

Given

m

documents of total length

n

, we consider the problem of finding a longest string common to at least

d \geq 2

of the documents. This problem is known as the \emph{longest common substring (LCS) problem} and has a classic

O(n)

space and

O(n)

time solution (Weiner [FOCS'73], Hui [CPM'92]). However, the use of linear space is impractical in many applications. In this paper we show that for any trade-off parameter

1 \leq \tau \leq n

, the LCS problem can be solved in

O(\tau)

space and

O(n^2/\tau)

time, thus providing the first smooth deterministic time-space trade-off from constant to linear space. The result uses a new and very simple algorithm, which computes a

\tau

-additive approximation to the LCS in

O(n^2/\tau)

time and

O(1)

space. We also show a time-space trade-off lower bound for deterministic branching programs, which implies that any deterministic RAM algorithm solving the LCS problem on documents from a sufficiently large alphabet in

O(\tau)

space must use

\Omega(n\sqrt{\log(n/(\tau\log n))/\log\log(n/(\tau\log n)})

time.Comment: Accepted to 22nd European Symposium on Algorithm

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Efficient LZ78 factorization of grammar compressed text

Author: A. Amir
A. Jeż
E. Ukkonen
E.M. McCreight
J. Jansson
J. Westbrook
J. Ziv
J. Ziv
K. Goto
K. Goto
M. Crochemore
M. Li
M. Li
M.A. Bender
O. Berkman
P. Weiner
R. Cilibrasi
T. Kida
V. Freschi
W. Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size

n

representing a text

S

of length

N

, our algorithm computes the LZ78 factorization of

T

O(n\sqrt{N}+m\log N)

time and

O(n\sqrt{N}+m)

space, where

m

is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the

n\sqrt{N}

term in the time and space complexities becomes either

nL

, where

L

is the length of the longest LZ78 factor, or

(N - \alpha)

where

\alpha \geq 0

is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of

S

of a certain length. Since

m = O(N/\log_\sigma N)

where

\sigma

is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when

\sigma

is constant, and can be more efficient when the text is compressible, i.e. when

m

and

n

are small.Comment: SPIRE 201

arXiv.org e-Print Archive

Crossref

BOOL-AN: A method for comparative sequence analysis and phylogenetic reconstruction

Author: Ari Eszter
Horváth Arnold
Ittzés Péter
Jakó Éena
Podani János
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

A novel discrete mathematical approach is proposed as an additional tool for molecular systematics which does not require prior statistical assumptions concerning the evolutionary process. The method is based on algorithms generating mathematical representations directly from DNA/RNA or protein sequences, followed by the output of numerical (scalar or vector) and visual characteristics (graphs). The binary encoded sequence information is transformed into a compact analytical form, called the Iterative Canonical Form (or ICF) of Boolean functions, which can then be used as a generalized molecular descriptor. The method provides raw vector data for calculating different distance matrices, which in turn can be analyzed by neighbor-joining or UPGMA to derive a phylogenetic tree, or by principal coordinates analysis to get an ordination scattergram. The new method and the associated software for inferring phylogenetic trees are called the Boolean analysis or BOOL-AN

Crossref

Repository of the Academy's Library

Lobsters with an almost perfect matching are graceful

Author: Krop Elliot
Publication venue
Publication date: 17/02/2014
Field of study

Let

T

be a lobster with a matching that covers all but one vertex. We show that in this case,

T

is graceful.Comment: 4 page

arXiv.org e-Print Archive

CiteSeerX

Internal Pattern Matching Queries in a Text and Applications

Author: Kociumaka Tomasz
Radoszewski Jakub
Rytter Wojciech
Waleń Tomasz
Publication venue
Publication date: 13/10/2014
Field of study

We consider several types of internal queries: questions about subwords of a text. As the main tool we develop an optimal data structure for the problem called here internal pattern matching. This data structure provides constant-time answers to queries about occurrences of one subword

x

in another subword

y

of a given text, assuming that

|y|=\mathcal{O}(|x|)

, which allows for a constant-space representation of all occurrences. This problem can be viewed as a natural extension of the well-studied pattern matching problem. The data structure has linear size and admits a linear-time construction algorithm. Using the solution to the internal pattern matching problem, we obtain very efficient data structures answering queries about: primitivity of subwords, periods of subwords, general substring compression, and cyclic equivalence of two subwords. All these results improve upon the best previously known counterparts. The linear construction time of our data structure also allows to improve the algorithm for finding

\delta

-subrepetitions in a text (a more general version of maximal repetitions, also called runs). For any fixed

\delta

we obtain the first linear-time algorithm, which matches the linear time complexity of the algorithm computing runs. Our data structure has already been used as a part of the efficient solutions for subword suffix rank & selection, as well as substring compression using Burrows-Wheeler transform composed with run-length encoding.Comment: 31 pages, 9 figures; accepted to SODA 201

arXiv.org e-Print Archive

Crossref

Log-space Algorithms for Paths and Matchings in k-trees

Author: Das Bireswar
Datta Samir
Nimbhorkar Prajakta
Publication venue
Publication date: 01/01/2010
Field of study

Reachability and shortest path problems are NL-complete for general graphs. They are known to be in L for graphs of tree-width 2 [JT07]. However, for graphs of tree-width larger than 2, no bound better than NL is known. In this paper, we improve these bounds for k-trees, where k is a constant. In particular, the main results of our paper are log-space algorithms for reachability in directed k-trees, and for computation of shortest and longest paths in directed acyclic k-trees. Besides the path problems mentioned above, we also consider the problem of deciding whether a k-tree has a perfect macthing (decision version), and if so, finding a perfect match- ing (search version), and prove that these two problems are L-complete. These problems are known to be in P and in RNC for general graphs, and in SPL for planar bipartite graphs [DKR08]. Our results settle the complexity of these problems for the class of k-trees. The results are also applicable for bounded tree-width graphs, when a tree-decomposition is given as input. The technique central to our algorithms is a careful implementation of divide-and-conquer approach in log-space, along with some ideas from [JT07] and [LMR07].Comment: Accepted in STACS 201

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Recommended from our members

A survey of induction algorithms for machine learning

Author: Porter Bruce W.
Publication venue: eScholarship, University of California
Publication date: 01/01/1983
Field of study

Central to all systems for machine learning from examples is an induction algorithm. The purpose of the algorithm is to generalize from a finite set of training examples a description consistent with the examples seen, and, hopefully, with the potentially infinite set of examples not seen. This paper surveys four machine learning induction algorithms. The knowledge representation schemes and a PDL description of algorithm control are emphasized. System characteristics that are peculiar to a domain of application are de-emphasized. Finally, a comparative summary of the learning algorithms is presented

eScholarship - University of California