Search CORE

4,097 research outputs found

Jeunesse d'un peintre (1878-1902): Suivi de ses "Heures valaisannes"

Author: Bille Edmond
Bille Edmond. -
Bille Stéphanie Corinna
Publication venue
Publication date: 22/06/2010
Field of study

Fast and Compact Regular Expression Matching

Author: Bille Philip
Farach-Colton Martin
Publication venue
Publication date: 01/01/2008
Field of study

We study 4 problems in string matching, namely, regular expression matching, approximate regular expression matching, string edit distance, and subsequence indexing, on a standard word RAM model of computation that allows logarithmic-sized words to be manipulated in constant time. We show how to improve the space and/or remove a dependency on the alphabet size for each problem using either an improved tabulation technique of an existing algorithm or by combining known algorithms in a new way

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

The IT University of Copenhagen's Repository

From Regular Expression Matching to Parsing

Author: Bille Philip
Gørtz Inge Li
Publication venue
Publication date: 29/01/2019
Field of study

Given a regular expression

R

and a string

Q

, the regular expression parsing problem is to determine if

Q

matches

R

and if so, determine how it matches, e.g., by a mapping of the characters of

Q

to the characters in

R

. Regular expression parsing makes finding matches of a regular expression even more useful by allowing us to directly extract subpatterns of the match, e.g., for extracting IP-addresses from internet traffic analysis or extracting subparts of genomes from genetic data bases. We present a new general techniques for efficiently converting a large class of algorithms that determine if a string

Q

matches regular expression

R

into algorithms that can construct a corresponding mapping. As a consequence, we obtain the first efficient linear space solutions for regular expression parsing

arXiv.org e-Print Archive

Online Research Database In Technology

String Indexing with Compressed Patterns

Author: Bille Philip
Steiner Teresa Anna
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 37th International Symposium on Theoretical Aspects of Computer Science (STACS 2020)
Publication date: 01/01/2020
Field of study

Given a string S of length n, the classic string indexing problem is to preprocess S into a compact data structure that supports efficient subsequent pattern queries. In this paper we consider the basic variant where the pattern is given in compressed form and the goal is to achieve query time that is fast in terms of the compressed size of the pattern. This captures the common client-server scenario, where a client submits a query and communicates it in compressed form to a server. Instead of the server decompressing the query before processing it, we consider how to efficiently process the compressed query directly. Our main result is a novel linear space data structure that achieves near-optimal query time for patterns compressed with the classic Lempel-Ziv 1977 (LZ77) compression scheme. Along the way we develop several data structural techniques of independent interest, including a novel data structure that compactly encodes all LZ77 compressed suffixes of a string in linear space and a general decomposition of tries that reduces the search time from logarithmic in the size of the trie to logarithmic in the length of the pattern

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Online Research Database In Technology

Fast evaluation of union-intersection expressions

Author: Bille Philip
Pagh Anna
Pagh Rasmus
Publication venue
Publication date: 01/01/2007
Field of study

We show how to represent sets in a linear space data structure such that expressions involving unions and intersections of sets can be computed in a worst-case efficient way. This problem has applications in e.g. information retrieval and database systems. We mainly consider the RAM model of computation, and sets of machine words, but also state our results in the I/O model. On a RAM with word size

w

, a special case of our result is that the intersection of

m

(preprocessed) sets, containing

n

elements in total, can be computed in expected time

O(n (\log w)^2 / w + km)

, where

k

is the number of elements in the intersection. If the first of the two terms dominates, this is a factor

w^{1-o(1)}

faster than the standard solution of merging sorted lists. We show a cell probe lower bound of time

\Omega(n/(w m \log m)+ (1-\tfrac{\log k}{w}) k)

, meaning that our upper bound is nearly optimal for small

m

. Our algorithm uses a novel combination of approximate set representations and word-level parallelism

arXiv.org e-Print Archive

CiteSeerX

The IT University of Copenhagen's Repository

Space-Efficient Re-Pair Compression

Author: Bille Philip
Gørtz Inge Li
Prezza Nicola
Publication venue
Publication date: 04/11/2016
Field of study

Re-Pair is an effective grammar-based compression scheme achieving strong compression rates in practice. Let

n

\sigma

, and

d

be the text length, alphabet size, and dictionary size of the final grammar, respectively. In their original paper, the authors show how to compute the Re-Pair grammar in expected linear time and

5n + 4\sigma^2 + 4d + \sqrt{n}

words of working space on top of the text. In this work, we propose two algorithms improving on the space of their original solution. Our model assumes a memory word of

\lceil\log_2 n\rceil

bits and a re-writable input text composed by

n

such words. Our first algorithm runs in expected

\mathcal O(n/\epsilon)

time and uses

(1+\epsilon)n +\sqrt n

words of space on top of the text for any parameter

0<\epsilon \leq 1

chosen in advance. Our second algorithm runs in expected

\mathcal O(n\log n)

time and improves the space to

n +\sqrt n

words

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Online Research Database In Technology

Subsequence Automata with Default Transitions

Author: Bille Philip
Gørtz Inge Li
Skjoldjensen Frederik Rye
Publication venue
Publication date: 01/01/2016
Field of study

Let

S

be a string of length

n

with characters from an alphabet of size

\sigma

. The \emph{subsequence automaton} of

S

(often called the \emph{directed acyclic subsequence graph}) is the minimal deterministic finite automaton accepting all subsequences of

S

. A straightforward construction shows that the size (number of states and transitions) of the subsequence automaton is

O(n\sigma)

and that this bound is asymptotically optimal. In this paper, we consider subsequence automata with \emph{default transitions}, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the \emph{delay}, i.e., the maximum number of consecutive default transitions followed before consuming a character. Specifically, given any integer parameter

k

1 < k \leq \sigma

, we present a subsequence automaton with default transitions of size

O(nk\log_{k}\sigma)

and delay

O(\log_k \sigma)

. Hence, with

k = 2

we obtain an automaton of size

O(n \log \sigma)

and delay

O(\log \sigma)

. On the other extreme, with

k = \sigma

, we obtain an automaton of size

O(n \sigma)

and delay

O(1)

, thus matching the bound for the standard subsequence automaton construction. Finally, we generalize the result to multiple strings. The key component of our result is a novel hierarchical automata construction of independent interest.Comment: Corrected typo

arXiv.org e-Print Archive

Online Research Database In Technology