21 research outputs found

    Non-Contiguous Pattern Avoidance in Binary Trees

    Get PDF
    We consider the enumeration of binary trees avoiding non-contiguous binary tree patterns. We begin by modifying a known algorithm that counts binary trees avoiding a single contiguous tree pattern. Next, we use this algorithm to prove several theorems about the generating function whose nth coefficient gives the number of n-leaf trees avoiding a pattern. In addition, we investigate and structurally explain the recurrences that arise from these generating functions. Finally, we examine the enumeration of binary trees avoiding multiple tree patterns

    Non-Contiguous Pattern Avoidance in Binary Trees

    Full text link
    In this paper we consider the enumeration of binary trees avoiding non-contiguous binary tree patterns. We begin by computing closed formulas for the number of trees avoiding a single binary tree pattern with 4 or fewer leaves and compare these results to analogous work for contiguous tree patterns. Next, we give an explicit generating function that counts binary trees avoiding a single non-contiguous tree pattern according to number of leaves. In addition, we enumerate binary trees that simultaneously avoid more than one tree pattern. Finally, we explore connections between pattern-avoiding trees and pattern-avoiding permutations.Comment: 21 pages, 2 figures, 1 tabl

    Enumeration of Binary Trees and Universal Types

    Get PDF
    Binary unlabeled ordered trees (further called binary trees) were studied at least since Euler, who enumerated them. The number of such trees with n nodes is now known as the Catalan number. Over the years various interesting questions about the statistics of such trees were investigated (e.g., height and path length distributions for a randomly selected tree). Binary trees find an abundance of applications in computer science. However, recently Seroussi posed a new and interesting problem motivated by information theory considerations: how many binary trees of a \emphgiven path length (sum of depths) are there? This question arose in the study of \emphuniversal types of sequences. Two sequences of length p have the same universal type if they generate the same set of phrases in the incremental parsing of the Lempel-Ziv'78 scheme since one proves that such sequences converge to the same empirical distribution. It turns out that the number of distinct types of sequences of length p corresponds to the number of binary (unlabeled and ordered) trees, T_p, of given path length p (and also the number of distinct Lempel-Ziv'78 parsings of length p sequences). We first show that the number of binary trees with given path length p is asymptotically equal to T_p ~ 2^2p/(log_2 p)(1+O(log ^-2/3 p)). Then we establish various limiting distributions for the number of nodes (number of phrases in the Lempel-Ziv'78 scheme) when a tree is selected randomly among all trees of given path length p. Throughout, we use methods of analytic algorithmics such as generating functions and complex asymptotics, as well as methods of applied mathematics such as the WKB method and matched asymptotics

    A lattice structure for ancestral configurations arising from the relationship between gene trees and species trees

    Full text link
    To a given gene tree topology GG and species tree topology SS with leaves labeled bijectively from a fixed set XX, one can associate a set of ancestral configurations, each of which encodes a set of gene lineages that can be found at a given node of a species tree. We introduce a lattice structure on ancestral configurations, studying the directed graphs that provide graphical representations of lattices of ancestral configurations. For a matching gene tree topology and species tree topology, we present a method for defining the digraph of ancestral configurations from the tree topology by using iterated cartesian products of graphs. We show that a specific set of paths on the digraph of ancestral configurations is in bijection with the set of labeled histories -- a well-known phylogenetic object that enumerates possible temporal orderings of the coalescences of a tree. For each of a series of tree families, we obtain closed-form expressions for the number of labeled histories by using this bijection to count paths on associated digraphs. Finally, we prove that our lattice construction extends to nonmatching tree pairs, and we use it to characterize pairs (G,S)(G,S) having the maximal number of ancestral configurations for a fixed GG. We discuss how the construction provides new methods for performing enumerations of combinatorial aspects of gene and species trees.Comment: 20 pages, 15 figures. This version contains reference updates, first author name update, minor changes to the tex

    Enumeration of General t-ary Trees and Universal Types

    Get PDF
    We consider t-ary trees characterized by their numbers of nodes and their total path length. When t=2 these are called binary trees, and in such trees a parent node may have up to t child nodes. We give asymptotic expansions for the total number of trees with nodes and path length p, when n and p are large. We consider several different ranges of n and p. For n→∞ and p=O(n^{3/2}) we recover the Airy distribution for the path length in trees with many nodes, and also obtain higher order asymptotic results. For p→∞ and an appropriate range of n we obtain a limiting Gaussian distribution for the number of nodes in trees with large path lengths. The mean and variance are expressed in terms of the maximal root of the Airy function. Singular perturbation methods, such as asymptotic matching and WKB type expansions, are used throughout, and they are combined with more standard methods of analytic combinatorics, such as generating functions, singularity analysis, saddle point method, etc. The results are applicable to problems in information theory, that involve data compression schemes which parse long sequence into shorter phrases. Numerical studies show the accuracy of the various asymptotic approximations. Key Words: Trees; Universal Types; Asymptotics; Path Length; Singular Perturbation

    Data Structures for Efficient String Algorithms

    Get PDF
    This thesis deals with data structures that are mostly useful in the area of string matching and string mining. Our main result is an O(n)-time preprocessing scheme for an array of n numbers such that subsequent queries asking for the position of a minimum element in a specified interval can be answered in constant time (so-called RMQs for Range Minimum Queries). The space for this data structure is 2n+o(n) bits, which is shown to be asymptotically optimal in a general setting. This improves all previous results on this problem. The main techniques for deriving this result rely on combinatorial properties of arrays and so-called Cartesian Trees. For compressible input arrays we show that further space can be saved, while not affecting the time bounds. For the two-dimensional variant of the RMQ-problem we give a preprocessing scheme with quasi-optimal time bounds, but with an asymptotic increase in space consumption of a factor of log(n). It is well known that algorithms for answering RMQs in constant time are useful for many different algorithmic tasks (e.g., the computation of lowest common ancestors in trees); in the second part of this thesis we give several new applications of the RMQ-problem. We show that our preprocessing scheme for RMQ (and a variant thereof) leads to improvements in the space- and time-consumption of the Enhanced Suffix Array, a collection of arrays that can be used for many tasks in pattern matching. In particular, we will see that in conjunction with the suffix- and LCP-array 2n+o(n) bits of additional space (coming from our RMQ-scheme) are sufficient to find all occ occurrences of a (usually short) pattern of length m in a (usually long) text of length n in O(m*s+occ) time, where s denotes the size of the alphabet. This is certainly optimal if the size of the alphabet is constant; for non-constant alphabets we can improve this to O(m*log(s)+occ) locating time, replacing our original scheme with a data structure of size approximately 2.54n bits. Again by using RMQs, we then show how to solve frequency-related string mining tasks in optimal time. In a final chapter we propose a space- and time-optimal algorithm for computing suffix arrays on texts that are logically divided into words, if one is just interested in finding all word-aligned occurrences of a pattern. Apart from the theoretical improvements made in this thesis, most of our algorithms are also of practical value; we underline this fact by empirical tests and comparisons on real-word problem instances. In most cases our algorithms outperform previous approaches by all means
    corecore