Search CORE

189 research outputs found

Graph compression using graph grammars

Author: Peternek Fabian Hans Adolf
Publication venue: The University of Edinburgh
Publication date: 02/07/2018
Field of study

This thesis presents work done on compressed graph representations via hyperedge replacement grammars. It comprises two main parts. Firstly the RePair compression scheme, known for strings and trees, is generalized to graphs using graph grammars. Given an object, the scheme produces a small context-free grammar generating the object (called a “straight-line grammar”). The theoretical foundations of this generalization are presented, followed by a description of a prototype implementation. This implementation is then evaluated on real-world and synthetic graphs. The experiments show that several graphs can be compressed stronger by the new method, than by current state-of-the-art approaches. The second part considers algorithmic questions of straight-line graph grammars. Two algorithms are presented to traverse the graph represented by such a grammar. Both algorithms have advantages and disadvantages: the first one works with any grammar but its runtime per traversal step is dependent on the input grammar. The second algorithm only needs constant time per traversal step, but works for a restricted class of grammars and requires quadratic preprocessing time and space. Finally speed-up algorithms are considered. These are algorithms that can decide specific problems in time depending only on the size of the compressed representation, and might thus be faster than a traditional algorithm would on the decompressed structure. The idea of such algorithms is to reuse computation already done for the rules of the grammar. The possible speed-ups achieved this way is proportional to the compression ratio of the grammar. The main results here are a method to answer “regular path queries”, and to decide whether two grammars generate isomorphic trees

Edinburgh Research Archive

Fast Label Extraction in the CDAWG

Author: A Blumer
D Belazzougui
D Gusfield
J Sirén
L Gasieniec
LS Russo
M Crochemore
M Crochemore
M Crochemore
M Crochemore
M Raffinot
MA Bender
O Berkman
T Gagie
V Mäkinen
V Mäkinen
Publication venue
Publication date: 26/09/2017
Field of study

The compact directed acyclic word graph (CDAWG) of a string

T

of length

n

takes space proportional just to the number

e

of right extensions of the maximal repeats of

T

, and it is thus an appealing index for highly repetitive datasets, like collections of genomes from similar species, in which

e

grows significantly more slowly than

n

. We reduce from

O(m\log{\log{n}})

O(m)

the time needed to count the number of occurrences of a pattern of length

m

, using an existing data structure that takes an amount of space proportional to the size of the CDAWG. This implies a reduction from

O(m\log{\log{n}}+\mathtt{occ})

O(m+\mathtt{occ})

in the time needed to locate all the

\mathtt{occ}

occurrences of the pattern. We also reduce from

O(k\log{\log{n}})

O(k)

the time needed to read the

k

characters of the label of an edge of the suffix tree of

T

, and we reduce from

O(m\log{\log{n}})

O(m)

the time needed to compute the matching statistics between a query of length

m

and

T

, using an existing representation of the suffix tree based on the CDAWG. All such improvements derive from extracting the label of a vertex or of an arc of the CDAWG using a straight-line program induced by the reversed CDAWG.Comment: 16 pages, 1 figure. In proceedings of the 24th International Symposium on String Processing and Information Retrieval (SPIRE 2017). arXiv admin note: text overlap with arXiv:1705.0864

arXiv.org e-Print Archive

Crossref

Traversing Grammar-Compressed Trees with Constant Delay

Author: Lohrey Markus
Maneth Sebastian
Reh Carl Philipp
Publication venue
Publication date: 10/11/2015
Field of study

A grammar-compressed ranked tree is represented with a linear space overhead so that a single traversal step, i.e., the move to the parent or the i-th child, can be carried out in constant time. Moreover, we extend our data structure such that equality of subtrees can be checked in constant time

arXiv.org e-Print Archive

Crossref

Edinburgh Research Explorer

Topics in combinatorial pattern matching

Author: Vildhøj Hjalte Wedel
Publication venue: Technical University of Denmark
Publication date: 01/01/2015
Field of study

Online Research Database In Technology

Regular Languages meet Prefix Sorting

Author: Alanko Jarno
D'Agostino Giovanna
Policriti Alberto
Prezza Nicola
Publication venue
Publication date: 09/07/2019
Field of study

Indexing strings via prefix (or suffix) sorting is, arguably, one of the most successful algorithmic techniques developed in the last decades. Can indexing be extended to languages? The main contribution of this paper is to initiate the study of the sub-class of regular languages accepted by an automaton whose states can be prefix-sorted. Starting from the recent notion of Wheeler graph [Gagie et al., TCS 2017]-which extends naturally the concept of prefix sorting to labeled graphs-we investigate the properties of Wheeler languages, that is, regular languages admitting an accepting Wheeler finite automaton. Interestingly, we characterize this family as the natural extension of regular languages endowed with the co-lexicographic ordering: when sorted, the strings belonging to a Wheeler language are partitioned into a finite number of co-lexicographic intervals, each formed by elements from a single Myhill-Nerode equivalence class. Moreover: (i) We show that every Wheeler NFA (WNFA) with

n

states admits an equivalent Wheeler DFA (WDFA) with at most

2n-1-|\Sigma|

states that can be computed in

O(n^3)

time. This is in sharp contrast with general NFAs. (ii) We describe a quadratic algorithm to prefix-sort a proper superset of the WDFAs, a

O(n\log n)

-time online algorithm to sort acyclic WDFAs, and an optimal linear-time offline algorithm to sort general WDFAs. By contribution (i), our algorithms can also be used to index any WNFA at the moderate price of doubling the automaton's size. (iii) We provide a minimization theorem that characterizes the smallest WDFA recognizing the same language of any input WDFA. The corresponding constructive algorithm runs in optimal linear time in the acyclic case, and in

O(n\log n)

time in the general case. (iv) We show how to compute the smallest WDFA equivalent to any acyclic DFA in nearly-optimal time.Comment: added minimization theorems; uploaded submitted version; New version with new results (W-MH theorem, linear determinization), added author: Giovanna D'Agostin

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Subpath Queries on Compressed Graphs: A Survey

Author: Prezza Nicola
Publication venue: 'MDPI AG'
Publication date: 13/12/2020
Field of study

Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text T, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in T in time proportional to the query’s length. The earliest optimal-time solution to the problem, the suffix tree, dates back to 1973 and requires up to two orders of magnitude more space than the plain text just to be stored. In the year 2000, two breakthrough works showed that efficient queries can be achieved without this space overhead: a fast index be stored in a space proportional to the text’s entropy. These contributions had an enormous impact in bioinformatics: today, virtually any DNA aligner employs compressed indexes. Recent trends considered more powerful compression schemes (dictionary compressors) and generalizations of the problem to labeled graphs: after all, texts can be viewed as labeled directed paths. In turn, since finite state automata can be considered as a particular case of labeled graphs, these findings created a bridge between the fields of compressed indexing and regular language theory, ultimately allowing to index regular languages and promising to shed new light on problems, such as regular expression matching. This survey is a gentle introduction to the main landmarks of the fascinating journey that took us from suffix trees to today’s compressed indexes for labeled graphs and regular languages

Multidisciplinary Digital Publishing Institute

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

31th International Symposium on Theoretical Aspects of Computer Science: STACS '14, March 5th to March 8th, 2014, Lyon, France

Author: STACS <31 2014, Lyon>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/03/2014
Field of study

Digitale Bibliothek Thüringen