Search CORE

1,106 research outputs found

Improved Approximate String Matching and Regular Expression Matching on Ziv-Lempel Compressed Texts

Author: A. Amir
E.W. Myers
G. Navarro
G. Navarro
G. Navarro
G.M. Landau
J. Kärkkäinen
J. Ziv
J. Ziv
K. Thompson
M. Dietzfelbinger
M. Farach
P. Sellers
R. Cole
T.A. Welch
V. Mäkinen
Publication venue
Publication date: 01/01/2007
Field of study

We study the approximate string matching and regular expression matching problem for the case when the text to be searched is compressed with the Ziv-Lempel adaptive dictionary compression schemes. We present a time-space trade-off that leads to algorithms improving the previously known complexities for both problems. In particular, we significantly improve the space bounds, which in practical applications are likely to be a bottleneck

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Southern Denmark Research Output

Online Research Database In Technology

A Faster Implementation of Online Run-Length Burrows-Wheeler Transform

Author: A Bowe
D Belazzougui
G Navarro
G Navarro
J Sirén
J Ziv
JI Munro
MA Bender
V Mäkinen
W Hon
Publication venue
Publication date: 14/10/2017
Field of study

Run-length encoding Burrows-Wheeler Transformed strings, resulting in Run-Length BWT (RLBWT), is a powerful tool for processing highly repetitive strings. We propose a new algorithm for online RLBWT working in run-compressed space, which runs in

O(n\lg r)

time and

O(r\lg n)

bits of space, where

n

is the length of input string

S

received so far and

r

is the number of runs in the BWT of the reversed

S

. We improve the state-of-the-art algorithm for online RLBWT in terms of empirical construction time. Adopting the dynamic list for maintaining a total order, we can replace rank queries in a dynamic wavelet tree on a run-length compressed string by the direct comparison of labels in a dynamic list. The empirical result for various benchmarks show the efficiency of our algorithm, especially for highly repetitive strings.Comment: In Proc. IWOCA201

arXiv.org e-Print Archive

Crossref

Efficient LZ78 factorization of grammar compressed text

Author: A. Amir
A. Jeż
E. Ukkonen
E.M. McCreight
J. Jansson
J. Westbrook
J. Ziv
J. Ziv
K. Goto
K. Goto
M. Crochemore
M. Li
M. Li
M.A. Bender
O. Berkman
P. Weiner
R. Cilibrasi
T. Kida
V. Freschi
W. Rytter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present an efficient algorithm for computing the LZ78 factorization of a text, where the text is represented as a straight line program (SLP), which is a context free grammar in the Chomsky normal form that generates a single string. Given an SLP of size

n

representing a text

S

of length

N

, our algorithm computes the LZ78 factorization of

T

O(n\sqrt{N}+m\log N)

time and

O(n\sqrt{N}+m)

space, where

m

is the number of resulting LZ78 factors. We also show how to improve the algorithm so that the

n\sqrt{N}

term in the time and space complexities becomes either

nL

, where

L

is the length of the longest LZ78 factor, or

(N - \alpha)

where

\alpha \geq 0

is a quantity which depends on the amount of redundancy that the SLP captures with respect to substrings of

S

of a certain length. Since

m = O(N/\log_\sigma N)

where

\sigma

is the alphabet size, the latter is asymptotically at least as fast as a linear time algorithm which runs on the uncompressed string when

\sigma

is constant, and can be more efficient when the text is compressible, i.e. when

m

and

n

are small.Comment: SPIRE 201

arXiv.org e-Print Archive

Crossref

Dynamic Fluctuation Phenomena in Double Membrane Films

Author: A. M. Bellocq
A. M. Polyakov
C. Dominicis de
D. Förster
D. Nelson
E. I. Kats
E. I. Kats
E. I. Kats
E. I. Kats
F. Brochard
G. Porte
G. Porte
H. K. Janssen
H. Kleinert
H. W. Wyld
I. E. Dzyaloshinskii
J. N. Israelashvili
L. D. Landau
L. Peliti
P. B. Canham
P. C. Martin
P. G. Gennes de
P. Richetti
R. Bar-Ziv
R. Bar-Ziv
R. Bar-Ziv
S. A. Safran
S. K. Ma
S. M. Block
S. V. Malinin
V. V. Lebedev
V. V. Lebedev
W. Helfrich
W. Helfrich
W. Helfrich
Publication venue: 'Pleiades Publishing Ltd'
Publication date: 18/11/1997
Field of study

Dynamics of double membrane films is investigated in the long-wavelength limit including the overdamped squeezing mode. We demonstrate that thermal fluctuations essentially modify the character of the mode due to its nonlinear coupling to the transversal shear hydrodynamic mode. The corresponding Green function acquires as a function of the frequency a cut along the imaginary semi-axis. Fluctuations lead to increasing the attenuation of the squeezing mode it becomes larger than the `bare' value.Comment: 7 pages, Revte

arXiv.org e-Print Archive

Crossref

MPG.PuRe

Numerical Observation of a Tubular Phase in Anisotropic Membranes

Author: Gudmar Thorleifsson
H. Flyvbjerg
J. V. Selinger
L. Radzihovsky
M. Bowick
M. S. Spector
Marco Falcioni
Mark Bowick
P. Nelson
R. Bar-Ziv
T. Lubensky
Y. Kantor
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/1997
Field of study

We provide the first numerical evidence for the existence of a tubular phase, predicted by Radzihovsky and Toner (RT), for anisotropic tethered membranes without self-avoidance. Incorporating anisotropy into the bending rigidity of a simple model of a tethered membrane with free boundary conditions, we show that the model indeed has two phase transitions corresponding to the flat-to-tubular and tubular-to-crumpled transitions. For the tubular phase we measure the Flory exponent

\nu_F

and the roughness exponent

\zeta

. We find

\nu_F=0.305(14)

and

\zeta=0.895(60)

, which are in reasonable agreement with the theoretical predictions of RT ---

\nu_F=1/4

and

\zeta=1

.Comment: 8 pages, LaTeX, REVTEX, final published versio

arXiv.org e-Print Archive

CiteSeerX

Crossref

Syracuse University Research Facility and Collaborative Environment

Composite repetition-aware data structures

Author: A Blumer
A Lempel
D Arroyuelo
D Belazzougui
DE Willard
J Radoszewski
J Sirén
J Ziv
M Crochemore
M Crochemore
M Raffinot
P Ferragina
S Kreft
T Gagie
V Mäkinen
V Mäkinen
W Rytter
Publication venue
Publication date: 01/01/2015
Field of study

In highly repetitive strings, like collections of genomes from the same species, distinct measures of repetition all grow sublinearly in the length of the text, and indexes targeted to such strings typically depend only on one of these measures. We describe two data structures whose size depends on multiple measures of repetition at once, and that provide competitive tradeoffs between the time for counting and reporting all the exact occurrences of a pattern, and the space taken by the structure. The key component of our constructions is the run-length encoded BWT (RLBWT), which takes space proportional to the number of BWT runs: rather than augmenting RLBWT with suffix array samples, we combine it with data structures from LZ77 indexes, which take space proportional to the number of LZ77 factors, and with the compact directed acyclic word graph (CDAWG), which takes space proportional to the number of extensions of maximal repeats. The combination of CDAWG and RLBWT enables also a new representation of the suffix tree, whose size depends again on the number of extensions of maximal repeats, and that is powerful enough to support matching statistics and constant-space traversal.Comment: (the name of the third co-author was inadvertently omitted from previous version

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Udine

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

Archivio della ricerca- LUISS Libera Università Internazionale degli Studi Sociali Guido Carli di Roma

Analysis of the uncertainty in the monetary valuation of ecosystem services - a case study at the river basin scale

Author: Acuña V
Boithias L
Corominas L
Kumar V
Marqués M
Schuhmacher M
Terrado M
Ziv G
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

Ecosystem services provide multiple benefits to human wellbeing and are increasingly considered by 18 policy-makers in environmental management. However, the uncertainty related with the monetary 19 valuation of these benefits is not yet adequately defined or integrated by policy-makers. Given this 20 background, our aim was to quantify different sources of uncertainty when performing monetary 21 valuation of ecosystem services, in order to provide a series of guidelines to reduce them. With an 22 example of 4 ecosystem services (i.e., water provisioning, waste treatment, erosion protection, and 23 habitat for species) provided at the river basin scale, we quantified the uncertainty associated with 24 the following sources: (1) the number of services considered, (2) the number of benefits considered 25 for each service, (3) the valuation metrics (i.e. valuation methods) used to value benefits, and (4) the 26 uncertainty of the parameters included in the valuation metrics. Results indicate that the highest 27 uncertainty was caused by the number of services considered, as well as by the number of benefits 28 considered for each service, whereas the parametric uncertainty was similar to the one related to the 29 selection of valuation metric, thus suggesting that the parametric uncertainty, which is the only 30 uncertainty type commonly considered, was less critical than the structural uncertainty, which is in 31 turn mainly dependent on the decision-making context. Given the uncertainty associated to the 32 valuation structure, special attention should be given to the selection of services, benefits and 33 metrics according to a given context

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

DUGiDocs – Universitat de Girona

HAL-INSU

White Rose Research Online

Suffix Tree of Alignment: An Efficient Index for Similar Data

Author: A. Amir
D. Gusfield
E. Ukkonen
E.M. McCreight
G. Navarro
H.H. Do
J. Ziv
K. Sadakane
M. Crochemore
M. Farach-Colton
P. Bille
R. Grossi
R.A. Baeza-Yates
S. Huang
S. Karlin
S. Kuruppu
V. Levenshtein
V. Mäkinen
V. Mäkinen
Publication venue
Publication date: 01/01/2013
Field of study

We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings

A

and

B

is a compacted trie representing all suffixes in

A

and

B

. It has

|A|+|B|

leaves and can be constructed in

O(|A|+|B|)

time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of

A

and

B

. In this paper we propose a space/time-efficient suffix tree of alignment which wisely exploits the similarity in an alignment. Our suffix tree for an alignment of

A

and

B

has

|A| + l_d + l_1

leaves where

l_d

is the sum of the lengths of all parts of

B

different from

A

and

l_1

is the sum of the lengths of some common parts of

A

and

B

. We did not compromise the pattern search to reduce the space. Our suffix tree can be searched for a pattern

P

O(|P|+occ)

time where

occ

is the number of occurrences of

P

A

and

B

. We also present an efficient algorithm to construct the suffix tree of alignment. When the suffix tree is constructed from scratch, the algorithm requires

O(|A| + l_d + l_1 + l_2)

time where

l_2

is the sum of the lengths of other common substrings of

A

and

B

. When the suffix tree of

A

is already given, it requires

O(l_d + l_1 + l_2)

time.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX

Crossref

King's Research Portal

Dictionary-based methods for information extraction

Author: A. Baronchelli
Benedetto
Bennett
E. Caglioti
E. Pizzi
Lempel
Li
Li
Puglisi
Shannon
V. Loreto
Wyner
Ziv
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

In this paper, we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called dictionary of a sequence. Dictionaries are intrinsically interesting and a study of their features can be of great usefulness to investigate the properties of the sequences they have been extracted from e.g. DNA strings. We then describe a procedure of string comparison between dictionary-created sequences (or artificial texts) that gives very good results in several contexts. We finally present some results on self-consistent classification problems

arXiv.org e-Print Archive

City Research Online

Crossref

Archivio della ricerca- Università di Roma La Sapienza