Search CORE

74 research outputs found

Bubble-Flip---A New Generation Algorithm for Prefix Normal Words

Author: Cicalese Ferdinando
Lipták Zsuzsanna
Rossi Massimiliano
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

We present a new recursive generation algorithm for prefix normal words. These are binary strings with the property that no substring has more 1s than the prefix of the same length. The new algorithm uses two operations on binary strings, which exploit certain properties of prefix normal words in a smart way. We introduce infinite prefix normal words and show that one of the operations used by the algorithm, if applied repeatedly to extend the string, produces an ultimately periodic infinite word, which is prefix normal. Moreover, based on the original finite word, we can predict both the length and the density of an ultimate period of this infinite word.Comment: 30 pages, 3 figures, accepted in Theoret. Comp. Sc.. This is the journal version of the paper with the same title at LATA 2018 (12th International Conference on Language and Automata Theory and Applications, Tel Aviv, April 9-11, 2018

arXiv.org e-Print Archive

Crossref

Catalogo dei prodotti della ricerca

On the Parikh-de-Bruijn grid

Author: Burcsi Péter
Lipták Zsuzsanna
Smyth W. F.
Publication venue
Publication date: 01/01/2017
Field of study

We introduce the Parikh-de-Bruijn grid, a graph whose vertices are fixed-order Parikh vectors, and whose edges are given by a simple shift operation. This graph gives structural insight into the nature of sets of Parikh vectors as well as that of the Parikh set of a given string. We show its utility by proving some results on Parikh-de-Bruijn strings, the abelian analog of de-Bruijn sequences.Comment: 18 pages, 3 figures, 1 tabl

arXiv.org e-Print Archive

Catalogo dei prodotti della ricerca

Binary Jumbled String Matching for Highly Run-Length Compressible Texts

Author: Badkobeh Golnaz
Fici Gabriele
Kroon Steve
Lipták Zsuzsanna
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The Binary Jumbled String Matching problem is defined as: Given a string

s

over

\{a,b\}

of length

n

and a query

(x,y)

, with

x,y

non-negative integers, decide whether

s

has a substring

t

with exactly

x

a

's and

y

b

's. Previous solutions created an index of size O(n) in a pre-processing step, which was then used to answer queries in constant time. The fastest algorithms for construction of this index have running time

O(n^2/\log n)

[Burcsi et al., FUN 2010; Moosa and Rahman, IPL 2010], or

O(n^2/\log^2 n)

in the word-RAM model [Moosa and Rahman, JDA 2012]. We propose an index constructed directly from the run-length encoding of

s

. The construction time of our index is

O(n+\rho^2\log \rho)

, where O(n) is the time for computing the run-length encoding of

s

and

\rho

is the length of this encoding---this is no worse than previous solutions if

\rho = O(n/\log n)

and better if

\rho = o(n/\log n)

. Our index

L

can be queried in

O(\log \rho)

time. While

|L|= O(\min(n, \rho^{2}))

in the worst case, preliminary investigations have indicated that

|L|

may often be close to

\rho

. Furthermore, the algorithm for constructing the index is conceptually simple and easy to implement. In an attempt to shed light on the structure and size of our index, we characterize it in terms of the prefix normal forms of

s

introduced in [Fici and Lipt\'ak, DLT 2011].Comment: v2: only small cosmetic changes; v3: new title, weakened conjectures on size of Corner Index (we no longer conjecture it to be always linear in size of RLE); removed experimental part on random strings (these are valid but limited in their predictive power w.r.t. general strings); v3 published in IP

arXiv.org e-Print Archive

Crossref

Catalogo dei prodotti della ricerca

Archivio istituzionale della ricerca - Università di Palermo

On Infinite Prefix Normal Words

Author: Cicalese Ferdinando
Lipták Zsuzsanna
Rossi Massimiliano
Publication venue
Publication date: 15/11/2018
Field of study

Prefix normal words are binary words that have no factor with more

1

s than the prefix of the same length. Finite prefix normal words were introduced in [Fici and Lipt\'ak, DLT 2011]. In this paper, we study infinite prefix normal words and explore their relationship to some known classes of infinite binary words. In particular, we establish a connection between prefix normal words and Sturmian words, between prefix normal words and abelian complexity, and between prefix normality and lexicographic order.Comment: 20 pages, 4 figures, accepted at SOFSEM 2019 (45th International Conference on Current Trends in Theory and Practice of Computer Science, Nov\'y Smokovec, Slovakia, January 27-30, 2019

arXiv.org e-Print Archive

PubMed Central

Catalogo dei prodotti della ricerca

Suffix Sorting via Matching Statistics

Author: Lipták Zsuzsanna
Masillo Francesco
Puglisi Simon J.
Publication venue: Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Publication date: 01/01/2022
Field of study

Funding Information: Academy of Finland grants 339070 and 351150 Publisher Copyright: © Zsuzsanna Lipták, Francesco Masillo, and Simon J. Puglisi.We introduce a new algorithm for constructing the generalized suffix array of a collection of highly similar strings. As a first step, we construct a compressed representation of the matching statistics of the collection with respect to a reference string. We then use this data structure to distribute suffixes into a partial order, and subsequently to speed up suffix comparisons to complete the generalized suffix array. Our experimental evidence with a prototype implementation (a tool we call sacamats) shows that on string collections with highly similar strings we can construct the suffix array in time competitive with or faster than the fastest available methods. Along the way, we describe a heuristic for fast computation of the matching statistics of two strings, which may be of independent interest.Peer reviewe

Dagstuhl Research Online Publication Server

Catalogo dei prodotti della ricerca

Helsingin yliopiston digitaalinen arkisto

Pattern Discovery in Colored Strings

Author: Lipták Zsuzsanna
Puglisi Simon J.
Rossi Massimiliano
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2021
Field of study

In this paper, we consider the problem of identifying patterns of interest in colored strings. A colored string is a string where each position is assigned one of a finite set of colors. Our task is to find substrings of the colored string that always occur followed by the same color at the same distance. The problem is motivated by applications in embedded systems verification, in particular, assertion mining. The goal there is to automatically find properties of the embedded system from the analysis of its simulation traces. We show that, in our setting, the number of patterns of interest is upper-bounded by

\mathcal{O}(n^2)

, where

n

is the length of the string. We introduce a baseline algorithm, running in

\mathcal{O}(n^2)

time, which identifies all patterns of interest satisfying certain minimality conditions, for all colors in the string. For the case where one is interested in patterns related to one color only, we also provide a second algorithm which runs in

\mathcal{O}(n^2\log n)

time in the worst case but is faster than the baseline algorithm in practice. Both solutions use suffix trees, and the second algorithm also uses an appropriately defined priority queue, which allows us to reduce the number of computations. We performed an experimental evaluation of the proposed approaches over both synthetic and real-world datasets, and found that the second algorithm outperforms the first algorithm on all simulated data, while on the real-world data, the performance varies between a slight slowdown (on half of the datasets) and a speedup by a factor of up to 11.Comment: 22 pages, 5 figures, 2 tables, published in ACM Journal of Experimental Algorithmics. This is the journal version of the paper with the same title at SEA 2020 (18th Symposium on Experimental Algorithms, Catania, Italy, June 16-18, 2020

arXiv.org e-Print Archive

Catalogo dei prodotti della ricerca

Judgment of the employment by systematic data collection method : [absztrakt]

Author: Dabasi Halász Zsuzsanna
Lipták Katalin
Siposné Nándori Eszter
Publication venue: Universitas Szeged Press
Publication date: 01/01/2009
Field of study

University of Szeged

In memoriam G. Fekete Éva

Author: Dabasi-Halász Zsuzsanna
Lipták Katalin
Publication venue: 'Ter es Tarsadalom'
Publication date: 01/01/2017
Field of study

Repository of the Academy's Library

On Compressing Collections of Substring Samples

Author: Badkobeh Golnaz
Giuliani Sara
Lipták Zsuzsanna
Puglisi Simon J.
Publication venue
Publication date: 01/01/2022
Field of study

Publisher Copyright: © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).Given a string X = X[1..n] of length n, and integers m and s, such that n > m ≥ 2s > 0, we consider the problem of compressing the string S formed by concatenating the substrings of X of length m starting at positions i ≡ 1 (mod s). In particular, we provide an upper bound of (2n − m)/s + 2z + (m − s) on the size of the Lempel-Ziv (LZ77) parsing of S, where z is the size of the parsing of X. We also show that a related bound holds regardless of the order in which the substrings are concatenated in the formation of S. If X is viewed as a genome sequence, the above substring sampling process corresponds to an idealized model of short read DNA sequencing.Peer reviewe

Goldsmiths Research Online

Catalogo dei prodotti della ricerca

Helsingin yliopiston digitaalinen arkisto