87 research outputs found

    Asymptotics of Minimal Deterministic Finite Automata Recognizing a Finite Binary Language

    Get PDF
    We show that the number of minimal deterministic finite automata with n+1 states recognizing a finite binary language grows asymptotically for n ? ? like ?(n! 8? e^{3 a? n^{1/3}} n^{7/8}), where a? ? -2.338 is the largest root of the Airy function. For this purpose, we use a new asymptotic enumeration method proposed by the same authors in a recent preprint (2019). We first derive a new two-parameter recurrence relation for the number of such automata up to a given size. Using this result, we prove by induction tight bounds that are sufficiently accurate for large n to determine the asymptotic form using adapted Netwon polygons

    The Wavelet Trie: Maintaining an Indexed Sequence of Strings in Compressed Space

    Full text link
    An indexed sequence of strings is a data structure for storing a string sequence that supports random access, searching, range counting and analytics operations, both for exact matches and prefix search. String sequences lie at the core of column-oriented databases, log processing, and other storage and query tasks. In these applications each string can appear several times and the order of the strings in the sequence is relevant. The prefix structure of the strings is relevant as well: common prefixes are sought in strings to extract interesting features from the sequence. Moreover, space-efficiency is highly desirable as it translates directly into higher performance, since more data can fit in fast memory. We introduce and study the problem of compressed indexed sequence of strings, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while preserving provably good performance for the supported operations. We present a new data structure for this problem, the Wavelet Trie, which combines the classical Patricia Trie with the Wavelet Tree, a succinct data structure for storing a compressed sequence. The resulting Wavelet Trie smoothly adapts to a sequence of strings that changes over time. It improves on the state-of-the-art compressed data structures by supporting a dynamic alphabet (i.e. the set of distinct strings) and prefix queries, both crucial requirements in the aforementioned applications, and on traditional indexes by reducing space occupancy to close to the entropy of the sequence

    Binary Decision Diagrams: from Tree Compaction to Sampling

    Full text link
    Any Boolean function corresponds with a complete full binary decision tree. This tree can in turn be represented in a maximally compact form as a direct acyclic graph where common subtrees are factored and shared, keeping only one copy of each unique subtree. This yields the celebrated and widely used structure called reduced ordered binary decision diagram (ROBDD). We propose to revisit the classical compaction process to give a new way of enumerating ROBDDs of a given size without considering fully expanded trees and the compaction step. Our method also provides an unranking procedure for the set of ROBDDs. As a by-product we get a random uniform and exhaustive sampler for ROBDDs for a given number of variables and size

    Investigations into Proof Structures

    Full text link
    We introduce and elaborate a novel formalism for the manipulation and analysis of proofs as objects in a global manner. In this first approach the formalism is restricted to first-order problems characterized by condensed detachment. It is applied in an exemplary manner to a coherent and comprehensive formal reconstruction and analysis of historical proofs of a widely-studied problem due to {\L}ukasiewicz. The underlying approach opens the door towards new systematic ways of generating lemmas in the course of proof search to the effects of reducing the search effort and finding shorter proofs. Among the numerous reported experiments along this line, a proof of {\L}ukasiewicz's problem was automatically discovered that is much shorter than any proof found before by man or machine.Comment: This article is a continuation of arXiv:2104.1364

    Strict monotonic trees arising from evolutionary processes: combinatorial and probabilistic study

    Get PDF
    In this paper we study two models of labelled random trees that generalise the original unlabelled Schröder tree. Our new models can be seen as models for phylogenetic trees in which nodes represent species and labels encode the order of appearance of these species, and thus the chronology of evolution. One important feature of our trees is that they can be generated efficiently thanks to a dynamical, recursive construction. Our first model is an increasing tree in the classical sense (labels increase along each branch of the tree and each label appears only once). To better model phylogenetic trees, we relax the rules of labelling by allowing repetitions in the second model.For each of the two models, we provide asymptotic theorems for different characteristics of the tree (e.g. degree of the root, degree distribution, height, etc.), thus giving extensive information about the typical shapes of these trees. We also provide efficient algorithms to generate large trees efficiently in the two models. The proofs are based on a combination of analytic combinatorics, probabilistic methods, and bijective methods (we exhibit bijections between our models and well-known models of the literature such as permutations and Stirling numbers of both kinds).It turns out that even though our models are labelled, they can be specified simply in the world of ordinary generating functions. However, the resulting generating functions will be formal. Then, by applying Borel transforms the models will be amenable to techniques of analytic combinatorics

    Space-Efficient Data Structures for Collections of Textual Data

    Get PDF
    This thesis focuses on the design of succinct and compressed data structures for collections of string-based data, specifically sequences of semi-structured documents in textual format, sets of strings, and sequences of strings. The study of such collections is motivated by a large number of applications both in theory and practice. For textual semi-structured data, we introduce the concept of semi-index, a succinct construction that speeds up the access to documents encoded with textual semi-structured formats, such as JSON and XML, by storing separately a compact description of their parse trees, hence avoiding the need to re-parse the documents every time they are read. For string dictionaries, we describe a data structure based on a path decomposition of the compacted trie built on the string set. The tree topology is encoded using succinct data structures, while the node labels are compressed using a simple dictionary-based scheme. We also describe a variant of the path-decomposed trie for scored string sets, where each string has a score. This data structure can support efficiently top-k completion queries, that is, given a string p and an integer k, return the k highest scored strings among those prefixed by p. For sequences of strings, we introduce the problem of compressed indexed sequences of strings, that is, representing indexed sequences of strings in nearly-optimal compressed space, both in the static and dynamic settings, while supporting supports random access, searching, and counting operations, both for exact matches and prefix search. We present a new data structure, the Wavelet Trie, that solves the problem by combining a Patricia trie with a wavelet tree. The Wavelet Trie improves on the state-of-the-art compressed data structures for sequences by supporting a dynamic alphabet and prefix queries. Finally, we discuss the issue of the practical implementation of the succinct primitives used throughout the thesis for the experiments. These primitives are implemented as part of a publicly available library, Succinct, using state-of-the-art algorithms along with some improvements

    Data Structures for Efficient String Algorithms

    Get PDF
    This thesis deals with data structures that are mostly useful in the area of string matching and string mining. Our main result is an O(n)-time preprocessing scheme for an array of n numbers such that subsequent queries asking for the position of a minimum element in a specified interval can be answered in constant time (so-called RMQs for Range Minimum Queries). The space for this data structure is 2n+o(n) bits, which is shown to be asymptotically optimal in a general setting. This improves all previous results on this problem. The main techniques for deriving this result rely on combinatorial properties of arrays and so-called Cartesian Trees. For compressible input arrays we show that further space can be saved, while not affecting the time bounds. For the two-dimensional variant of the RMQ-problem we give a preprocessing scheme with quasi-optimal time bounds, but with an asymptotic increase in space consumption of a factor of log(n). It is well known that algorithms for answering RMQs in constant time are useful for many different algorithmic tasks (e.g., the computation of lowest common ancestors in trees); in the second part of this thesis we give several new applications of the RMQ-problem. We show that our preprocessing scheme for RMQ (and a variant thereof) leads to improvements in the space- and time-consumption of the Enhanced Suffix Array, a collection of arrays that can be used for many tasks in pattern matching. In particular, we will see that in conjunction with the suffix- and LCP-array 2n+o(n) bits of additional space (coming from our RMQ-scheme) are sufficient to find all occ occurrences of a (usually short) pattern of length m in a (usually long) text of length n in O(m*s+occ) time, where s denotes the size of the alphabet. This is certainly optimal if the size of the alphabet is constant; for non-constant alphabets we can improve this to O(m*log(s)+occ) locating time, replacing our original scheme with a data structure of size approximately 2.54n bits. Again by using RMQs, we then show how to solve frequency-related string mining tasks in optimal time. In a final chapter we propose a space- and time-optimal algorithm for computing suffix arrays on texts that are logically divided into words, if one is just interested in finding all word-aligned occurrences of a pattern. Apart from the theoretical improvements made in this thesis, most of our algorithms are also of practical value; we underline this fact by empirical tests and comparisons on real-word problem instances. In most cases our algorithms outperform previous approaches by all means

    Improved Cardinality Bounds for Rectangle Packing Representations

    Get PDF
    Axis-aligned rectangle packings can be characterized by the set of spatial relations that hold for pairs of rectangles (west, south, east, north). A representation of a packing consists of one satisfied spatial relation for each pair. We call a set of representations complete for n ∈ ℕ if it contains a representation of every packing of any n rectangles. Both in theory and practice, fastest known algorithms for a large class of rectangle packing problems enumerate a complete set R of representations. The running time of these algorithms is dominated by the (exponential) size of R. In this thesis, we improve the best known lower and upper bounds on the minimum cardinality of complete sets of representations. The new upper bound implies theoretically faster algorithms for many rectangle packing problems, for example in chip design, while the new lower bound imposes a limit on the running time that can be achieved by any algorithm following this approach. The proofs of both results are based on pattern-avoiding permutations. Finally, we empirically compute the minimum cardinality of complete sets of representations for small n. Our computations directly suggest two conjectures, connecting well-known Baxter permutations with the set of permutations avoiding an apparently new pattern, which in turn seem to generate complete sets of representations of minimum cardinality
    • 

    corecore