38,156 research outputs found

    Enumeration of General t-ary Trees and Universal Types

    Get PDF
    We consider t-ary trees characterized by their numbers of nodes and their total path length. When t=2 these are called binary trees, and in such trees a parent node may have up to t child nodes. We give asymptotic expansions for the total number of trees with nodes and path length p, when n and p are large. We consider several different ranges of n and p. For n→∞ and p=O(n^{3/2}) we recover the Airy distribution for the path length in trees with many nodes, and also obtain higher order asymptotic results. For p→∞ and an appropriate range of n we obtain a limiting Gaussian distribution for the number of nodes in trees with large path lengths. The mean and variance are expressed in terms of the maximal root of the Airy function. Singular perturbation methods, such as asymptotic matching and WKB type expansions, are used throughout, and they are combined with more standard methods of analytic combinatorics, such as generating functions, singularity analysis, saddle point method, etc. The results are applicable to problems in information theory, that involve data compression schemes which parse long sequence into shorter phrases. Numerical studies show the accuracy of the various asymptotic approximations. Key Words: Trees; Universal Types; Asymptotics; Path Length; Singular Perturbation

    Maximal clades in random binary search trees

    Full text link
    We study maximal clades in random phylogenetic trees with the Yule-Harding model or, equivalently, in binary search trees. We use probabilistic methods to reprove and extend earlier results on moment asymptotics and asymptotic normality. In particular, we give an explanation of the curious phenomenon observed by Drmota, Fuchs and Lee (2014) that asymptotic normality holds, but one should normalize using half the variance.Comment: 25 page

    Longest Common Pattern between two Permutations

    Full text link
    In this paper, we give a polynomial (O(n^8)) algorithm for finding a longest common pattern between two permutations of size n given that one is separable. We also give an algorithm for general permutations whose complexity depends on the length of the longest simple permutation involved in one of our permutations

    Tree Compression with Top Trees Revisited

    Get PDF
    We revisit tree compression with top trees (Bille et al, ICALP'13) and present several improvements to the compressor and its analysis. By significantly reducing the amount of information stored and guiding the compression step using a RePair-inspired heuristic, we obtain a fast compressor achieving good compression ratios, addressing an open problem posed by Bille et al. We show how, with relatively small overhead, the compressed file can be converted into an in-memory representation that supports basic navigation operations in worst-case logarithmic time without decompression. We also show a much improved worst-case bound on the size of the output of top-tree compression (answering an open question posed in a talk on this algorithm by Weimann in 2012).Comment: SEA 201

    Prospects and limitations of full-text index structures in genome analysis

    Get PDF
    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
    corecore