21 research outputs found
Non-Contiguous Pattern Avoidance in Binary Trees
We consider the enumeration of binary trees avoiding non-contiguous binary tree patterns. We begin by modifying a known algorithm that counts binary trees avoiding a single contiguous tree pattern. Next, we use this algorithm to prove several theorems about the generating function whose nth coefficient gives the number of n-leaf trees avoiding a pattern. In addition, we investigate and structurally explain the recurrences that arise from these generating functions. Finally, we examine the enumeration of binary trees avoiding multiple tree patterns
Non-Contiguous Pattern Avoidance in Binary Trees
In this paper we consider the enumeration of binary trees avoiding
non-contiguous binary tree patterns. We begin by computing closed formulas for
the number of trees avoiding a single binary tree pattern with 4 or fewer
leaves and compare these results to analogous work for contiguous tree
patterns. Next, we give an explicit generating function that counts binary
trees avoiding a single non-contiguous tree pattern according to number of
leaves. In addition, we enumerate binary trees that simultaneously avoid more
than one tree pattern. Finally, we explore connections between pattern-avoiding
trees and pattern-avoiding permutations.Comment: 21 pages, 2 figures, 1 tabl
Enumeration of Binary Trees and Universal Types
Binary unlabeled ordered trees (further called binary trees) were studied at least since Euler, who enumerated them. The number of such trees with n nodes is now known as the Catalan number. Over the years various interesting questions about the statistics of such trees were investigated (e.g., height and path length distributions for a randomly selected tree). Binary trees find an abundance of applications in computer science. However, recently Seroussi posed a new and interesting problem motivated by information theory considerations: how many binary trees of a \emphgiven path length (sum of depths) are there? This question arose in the study of \emphuniversal types of sequences. Two sequences of length p have the same universal type if they generate the same set of phrases in the incremental parsing of the Lempel-Ziv'78 scheme since one proves that such sequences converge to the same empirical distribution. It turns out that the number of distinct types of sequences of length p corresponds to the number of binary (unlabeled and ordered) trees, T_p, of given path length p (and also the number of distinct Lempel-Ziv'78 parsings of length p sequences). We first show that the number of binary trees with given path length p is asymptotically equal to T_p ~ 2^2p/(log_2 p)(1+O(log ^-2/3 p)). Then we establish various limiting distributions for the number of nodes (number of phrases in the Lempel-Ziv'78 scheme) when a tree is selected randomly among all trees of given path length p. Throughout, we use methods of analytic algorithmics such as generating functions and complex asymptotics, as well as methods of applied mathematics such as the WKB method and matched asymptotics
A lattice structure for ancestral configurations arising from the relationship between gene trees and species trees
To a given gene tree topology and species tree topology with leaves
labeled bijectively from a fixed set , one can associate a set of ancestral
configurations, each of which encodes a set of gene lineages that can be found
at a given node of a species tree. We introduce a lattice structure on
ancestral configurations, studying the directed graphs that provide graphical
representations of lattices of ancestral configurations. For a matching gene
tree topology and species tree topology, we present a method for defining the
digraph of ancestral configurations from the tree topology by using iterated
cartesian products of graphs. We show that a specific set of paths on the
digraph of ancestral configurations is in bijection with the set of labeled
histories -- a well-known phylogenetic object that enumerates possible temporal
orderings of the coalescences of a tree. For each of a series of tree families,
we obtain closed-form expressions for the number of labeled histories by using
this bijection to count paths on associated digraphs. Finally, we prove that
our lattice construction extends to nonmatching tree pairs, and we use it to
characterize pairs having the maximal number of ancestral
configurations for a fixed . We discuss how the construction provides new
methods for performing enumerations of combinatorial aspects of gene and
species trees.Comment: 20 pages, 15 figures. This version contains reference updates, first
author name update, minor changes to the tex
Enumeration of General t-ary Trees and Universal Types
We consider t-ary trees characterized by their numbers of nodes and their total path length. When t=2 these are called binary trees, and in such trees a parent node may have up to t child nodes. We give asymptotic expansions for the total number of trees with nodes and path length p, when n and p are large. We consider several different ranges of n and p. For n→∞ and p=O(n^{3/2}) we recover the Airy distribution for the path length in trees with many nodes, and also obtain higher order asymptotic results. For p→∞ and an appropriate range of n we obtain a limiting Gaussian distribution for the number of nodes in trees with large path lengths. The mean and variance are expressed in terms of the maximal root of the Airy function. Singular perturbation methods, such as asymptotic matching and WKB type expansions, are used throughout, and they are combined with more standard methods of analytic combinatorics, such as generating functions, singularity analysis, saddle point method, etc. The results are applicable to problems in information theory, that involve data compression schemes which parse long sequence into shorter phrases. Numerical studies show the accuracy of the various asymptotic approximations. Key Words: Trees; Universal Types; Asymptotics; Path Length; Singular Perturbation
Data Structures for Efficient String Algorithms
This thesis deals with data structures that are mostly useful in the area of string matching and string mining. Our main result is an O(n)-time preprocessing scheme for an array of n numbers such that subsequent queries asking for the position of a minimum element in a specified interval can be answered in constant time (so-called RMQs for Range Minimum Queries). The space for this data structure is 2n+o(n) bits, which is shown to be asymptotically optimal in a general setting. This improves all previous results on this problem. The main techniques for deriving this result rely on combinatorial properties of arrays and so-called Cartesian Trees. For compressible input arrays we show that further space can be saved, while not affecting the time bounds. For the two-dimensional variant of the RMQ-problem we give a preprocessing scheme with quasi-optimal time bounds, but with an asymptotic increase in space consumption of a factor of log(n).
It is well known that algorithms for answering RMQs in constant time are useful for many different algorithmic tasks (e.g., the computation of lowest common ancestors in trees); in the second part of this thesis we give several new applications of the RMQ-problem. We show that our preprocessing scheme for RMQ (and a variant thereof) leads to improvements in the space- and time-consumption of the Enhanced Suffix Array, a collection of arrays that can be used for many tasks in pattern matching. In particular, we will see that in conjunction with the suffix- and LCP-array 2n+o(n) bits of additional space (coming from our RMQ-scheme) are sufficient to find all occ occurrences of a (usually short) pattern of length m in a (usually long) text of length n in O(m*s+occ) time, where s denotes the size of the alphabet. This is certainly optimal if the size of the alphabet is constant; for non-constant alphabets we can improve this to O(m*log(s)+occ) locating time, replacing our original scheme with a data structure of size approximately 2.54n bits. Again by using RMQs, we then show how to solve frequency-related string mining tasks in optimal time. In a final chapter we propose a space- and time-optimal algorithm for computing suffix arrays on texts that are logically divided into words, if one is just interested in finding all word-aligned occurrences of a pattern.
Apart from the theoretical improvements made in this thesis, most of our algorithms are also of practical value; we underline this fact by empirical tests and comparisons on real-word problem instances. In most cases our algorithms outperform previous approaches by all means