Search CORE

4,735 research outputs found

The asymptotic number of prefix normal words

Author: Balister Paul
Gerke Stefanie
Publication venue: 'Elsevier BV'
Publication date: 19/03/2019
Field of study

We show that the number of prefix normal binary words of length

n

2^{n-\Theta((\log n)^2)}

. We also show that the maximum number of binary words of length

n

with a given fixed prefix normal form is

2^{n-O(\sqrt{n\log n})}

.Comment: 9 page

arXiv.org e-Print Archive

Crossref

Royal Holloway - Pure

Algorithms and Data Structures for Coding, Indexing, and Mining of Sequential Data

Author: Rossi Massimiliano
Publication venue
Publication date: 01/01/2020
Field of study

In recent years, the production of sequential data has been rapidly increasing. This requires solving challenging problems about how to represent information, how to retrieve information, and how to extract knowledge, from sequential data. These questions belong to the areas of coding, indexing, and mining, respectively. In this thesis, we investigate problems from those three areas. Coding refers to the way in which information is represented. Coding aims at generating optimal codes, that are codes having a minimum expected length. Codes can be generated for different purposes, from data compression to error detection/correction. The Lempel-Ziv 77 parsing produces an asymptotically optimal code in terms of compression. We study algorithms to efficiently decompress strings from the Lempel-Ziv 77 parsing, using memory proportional to the size of the parsing itself. We provide the first implementation of an algorithm by Bille et al., the only work we are aware of on this problem. We present a practical evaluation of this approach and several optimizations which improve the performance on all datasets we tested. Through the Ulam-R{'e}nyi game, it is possible to provide optimal adaptive error-correcting codes. The game consists of discovering an unknown

m

-bit number by asking membership questions the answers to which can be erroneous. Questions are formulated knowing the answers to all previous ones. We want to find an optimal strategy, i.e., a strategy that can identify any

m

-bit number using the theoretical minimum number of questions. We studied the case where questions are a union of up to a fixed number of intervals, and up to three answers can be erroneous. We first show that for any sufficiently large

m

, there exists a strategy to identify an initially unknown

m

-bit number which uses at most four intervals per question. We further refine our main tool to turn the above asymptotic result into a complete characterization of those instances of the Ulam-R{'e}nyi game that admit optimal strategies. Indexing refers to the way in which information is retrieved. An index for texts permits finding all occurrences of any substring, without traversing the whole text. Many applications require to look for approximate substrings. One of these is the problem of jumbled pattern matching, where two strings match if one is a permutation of the other. We study combinatorial aspects of prefix normal words, a class of binary words introduced in this context. These words can be used as indices for the Indexed Binary Jumbled Pattern Matching problem. We present a new recursive generation algorithm for prefix normal words that is competitive with the previous one but allows to list all prefix normal words sharing the same prefix. This sheds lights on novel insights that may help solving the problem of counting the number of prefix normal words of a given length. We then introduce infinite prefix normal words, and we show that one of the operations used by the algorithm, when repeatedly applied to extend a word, produces an infinite prefix normal word. This motivates the seeking for other operations that produce infinite prefix normal words. We found that one of these operations establishes a connection between prefix normal words and Sturmian words. We also explored the relationship between prefix normal words and Abelian complexity, as well as between prefix normal words and lexicographic order. Mining refers to the way in which information is converted into knowledge. The process of knowledge discovery covers several processing steps, including knowledge extraction. We analyze the problem of mining assertions for an embedded system from its simulation traces. This problem can be modeled as a pattern discovery problem on colored strings. We present two problems of pattern discovery on colored strings: patterns for one color only, or for all colors at the same time. We present two suffix tree-based algorithms. The first algorithm solves both the one color problem and the all colors problem. We then, introduce modifications which improve performance of the algorithm both on synthetic and on real data. We implemented and evaluated the proposed approaches, highlighting time trade-offs that can be obtained. A different way of knowledge extraction is based on the information-theoretic perspective of Pearl's model of causality. It has been postulated that the true causality direction between two phenomena A and B is related to the problem of finding the minimum entropy joint distribution between A and B. This problem is known to be NP-hard, and greedy algorithms have recently been proposed. We provide a novel analysis of one of the proposed heuristic showing that this algorithm guarantees an additive approximation of 1 bit. We then, provide a general criterion for guaranteeing an additive approximation factor of 1. This criterion may be of independent interest in other contexts where couplings are used

Catalogo dei prodotti della ricerca

Canonical Trees, Compact Prefix-free Codes and Sums of Unit Fractions: A Probabilistic Analysis

Author: Heuberger Clemens
Krenn Daniel
Wagner Stephan
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2015
Field of study

For fixed

t\ge 2

, we consider the class of representations of

1

as sum of unit fractions whose denominators are powers of

t

or equivalently the class of canonical compact

t

-ary Huffman codes or equivalently rooted

t

-ary plane "canonical" trees. We study the probabilistic behaviour of the height (limit distribution is shown to be normal), the number of distinct summands (normal distribution), the path length (normal distribution), the width (main term of the expectation and concentration property) and the number of leaves at maximum distance from the root (discrete distribution)

arXiv.org e-Print Archive

Stellenbosch University SUNScholar Repository

Normal, Abby Normal, Prefix Normal

Author: A. Amir
A. Butman
F. Cicalese
F. Ruskey
G. Benson
J. Ian Munro
K. Dührkop
L. Parida
L.-K. Lee
S. Böcker
S. Böcker
T. Gagie
T. Kociumaka
T.M. Moosa
T.M. Moosa
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

A prefix normal word is a binary word with the property that no substring has more 1s than the prefix of the same length. This class of words is important in the context of binary jumbled pattern matching. In this paper we present results about the number

pnw(n)

of prefix normal words of length

n

, showing that

pnw(n) =\Omega\left(2^{n - c\sqrt{n\ln n}}\right)

for some

c

and

pnw(n) = O \left(\frac{2^n (\ln n)^2}{n}\right)

. We introduce efficient algorithms for testing the prefix normal property and a "mechanical algorithm" for computing prefix normal forms. We also include games which can be played with prefix normal words. In these games Alice wishes to stay normal but Bob wants to drive her "abnormal" -- we discuss which parameter settings allow Alice to succeed.Comment: Accepted at FUN '1

arXiv.org e-Print Archive

Crossref

Catalogo dei prodotti della ricerca

Archivio istituzionale della ricerca - Università di Palermo

Multiplicative measures on free groups

Author: Borovik Alexandre V.
Myasnikov Alexei G.
Remeslennikov Vladimir N.
Publication venue
Publication date: 01/01/2002
Field of study

We introduce a family of atomic measures on free groups generated by no-return random walks. These measures are shown to be very convenient for comparing "relative sizes" of subgroups, context-free and regular subsets (that, subsets generated by finite automata) of free groups. Many asymptotic characteristics of subsets and subgroups are naturally expressed as analytic properties of related generating functions. We introduce an hierarchy of asymptotic behaviour "at infinity" of subsets in the free groups, more sensitive than the traditionally used asymptotic density, and apply it to normal subgroups and regular subsets.Comment: LaTeX, requires amssymb.sty; 31 pp Version 3: more detail in Example 2 and Tauberian theorem

arXiv.org e-Print Archive

CiteSeerX

Distributional convergence for the number of symbol comparisons used by QuickSort

Author: Fill James Allen
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2012
Field of study

Most previous studies of the sorting algorithm QuickSort have used the number of key comparisons as a measure of the cost of executing the algorithm. Here we suppose that the n independent and identically distributed (i.i.d.) keys are each represented as a sequence of symbols from a probabilistic source and that QuickSort operates on individual symbols, and we measure the execution cost as the number of symbol comparisons. Assuming only a mild "tameness" condition on the source, we show that there is a limiting distribution for the number of symbol comparisons after normalization: first centering by the mean and then dividing by n. Additionally, under a condition that grows more restrictive as p increases, we have convergence of moments of orders p and smaller. In particular, we have convergence in distribution and convergence of moments of every order whenever the source is memoryless, that is, whenever each key is generated as an infinite string of i.i.d. symbols. This is somewhat surprising; even for the classical model that each key is an i.i.d. string of unbiased ("fair") bits, the mean exhibits periodic fluctuations of order n.Comment: Published in at http://dx.doi.org/10.1214/12-AAP866 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref