47 research outputs found

    Transfer Theorems and Asymptotic Distributional Results for m-ary Search Trees

    Full text link
    We derive asymptotics of moments and identify limiting distributions, under the random permutation model on m-ary search trees, for functionals that satisfy recurrence relations of a simple additive form. Many important functionals including the space requirement, internal path length, and the so-called shape functional fall under this framework. The approach is based on establishing transfer theorems that link the order of growth of the input into a particular (deterministic) recurrence to the order of growth of the output. The transfer theorems are used in conjunction with the method of moments to establish limit laws. It is shown that (i) for small toll sequences (tn)(t_n) [roughly, tn=O(n1/2)t_n =O(n^{1 / 2})] we have asymptotic normality if m≤26m \leq 26 and typically periodic behavior if m≥27m \geq 27; (ii) for moderate toll sequences [roughly, tn=ω(n1/2)t_n = \omega(n^{1 / 2}) but tn=o(n)t_n = o(n)] we have convergence to non-normal distributions if m≤m0m \leq m_0 (where m0≥26m_0 \geq 26) and typically periodic behavior if m≥m0+1m \geq m_0 + 1; and (iii) for large toll sequences [roughly, tn=ω(n)t_n = \omega(n)] we have convergence to non-normal distributions for all values of m.Comment: 35 pages, 1 figure. Version 2 consists of expansion and rearragement of the introductory material to aid exposition and the shortening of Appendices A and B.

    Singularity analysis, Hadamard products, and tree recurrences

    Get PDF
    We present a toolbox for extracting asymptotic information on the coefficients of combinatorial generating functions. This toolbox notably includes a treatment of the effect of Hadamard products on singularities in the context of the complex Tauberian technique known as singularity analysis. As a consequence, it becomes possible to unify the analysis of a number of divide-and-conquer algorithms, or equivalently random tree models, including several classical methods for sorting, searching, and dynamically managing equivalence relationsComment: 47 pages. Submitted for publicatio

    Limiting distributions for additive functionals on Catalan trees

    Full text link
    Additive tree functionals represent the cost of many divide-and-conquer algorithms. We derive the limiting distribution of the additive functionals induced by toll functions of the form (a) n^\alpha when \alpha > 0 and (b) log n (the so-called shape functional) on uniformly distributed binary search trees, sometimes called Catalan trees. The Gaussian law obtained in the latter case complements the central limit theorem for the shape functional under the random permutation model. Our results give rise to an apparently new family of distributions containing the Airy distribution (\alpha = 1) and the normal distribution [case (b), and case (a) as α↓0\alpha \downarrow 0]. The main theoretical tools employed are recent results relating asymptotics of the generating functions of sequences to those of their Hadamard product, and the method of moments.Comment: 30 pages, 4 figures. Version 2 adds background information on singularity analysis and streamlines the presentatio

    The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance

    Full text link
    For two decades, the Colless index has been the most frequently used statistic for assessing the balance of phylogenetic trees. In this article, this statistic is studied under the Yule and uniform model of phylogenetic trees. The main tool of analysis is a coupling argument with another well-known index called the Sackin statistic. Asymptotics for the mean, variance and covariance of these two statistics are obtained, as well as their limiting joint distribution for large phylogenies. Under the Yule model, the limiting distribution arises as a solution of a functional fixed point equation. Under the uniform model, the limiting distribution is the Airy distribution. The cornerstone of this study is the fact that the probabilistic models for phylogenetic trees are strongly related to the random permutation and the Catalan models for binary search trees.Comment: Published at http://dx.doi.org/10.1214/105051606000000547 in the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Average Case and Distributional Analysis of Dual-Pivot Quicksort

    Get PDF
    In 2009, Oracle replaced the long-serving sorting algorithm in its Java 7 runtime library by a new dual-pivot Quicksort variant due to Vladimir Yaroslavskiy. The decision was based on the strikingly good performance of Yaroslavskiy's implementation in running time experiments. At that time, no precise investigations of the algorithm were available to explain its superior performance—on the contrary: previous theoretical studies of other dual-pivot Quicksort variants even discouraged the use of two pivots. In 2012, two of the authors gave an average case analysis of a simplified version of Yaroslavskiy's algorithm, proving that savings in the number of comparisons are possible. However, Yaroslavskiy's algorithm needs more swaps, which renders the analysis inconclusive. To force the issue, we herein extend our analysis to the fully detailed style of Knuth: we determine the exact number of executed Java Bytecode instructions. Surprisingly, Yaroslavskiy's algorithm needs sightly more Bytecode instructions than a simple implementation of classic Quicksort—contradicting observed running times. As in Oracle's library implementation, we incorporate the use of Insertionsort on small subproblems and show that it indeed speeds up Yaroslavskiy's Quicksort in terms of Bytecodes; but even with optimal Insertionsort thresholds, the new Quicksort variant needs slightly more Bytecode instructions on average. Finally, we show that the (suitably normalized) costs of Yaroslavskiy's algorithm converge to a random variable whose distribution is characterized by a fixed-point equation. From that, we compute variances of costs and show that for large n, costs are concentrated around their mean