24 research outputs found

    B-urns

    Full text link
    The fringe of a B-tree with parameter mm is considered as a particular P\'olya urn with mm colors. More precisely, the asymptotic behaviour of this fringe, when the number of stored keys tends to infinity, is studied through the composition vector of the fringe nodes. We establish its typical behaviour together with the fluctuations around it. The well known phase transition in P\'olya urns has the following effect on B-trees: for m≤59m\leq 59, the fluctuations are asymptotically Gaussian, though for m≥60m\geq 60, the composition vector is oscillating; after scaling, the fluctuations of such an urn strongly converge to a random variable WW. This limit is C\mathbb C-valued and it does not seem to follow any classical law. Several properties of WW are shown: existence of exponential moments, characterization of its distribution as the solution of a smoothing equation, existence of a density relatively to the Lebesgue measure on C\mathbb C, support of WW. Moreover, a few representations of the composition vector for various values of mm illustrate the different kinds of convergence

    Analytic urns

    Full text link
    This article describes a purely analytic approach to urn models of the generalized or extended P\'olya-Eggenberger type, in the case of two types of balls and constant ``balance,'' that is, constant row sum. The treatment starts from a quasilinear first-order partial differential equation associated with a combinatorial renormalization of the model and bases itself on elementary conformal mapping arguments coupled with singularity analysis techniques. Probabilistic consequences in the case of ``subtractive'' urns are new representations for the probability distribution of the urn's composition at any time n, structural information on the shape of moments of all orders, estimates of the speed of convergence to the Gaussian limit and an explicit determination of the associated large deviation function. In the general case, analytic solutions involve Abelian integrals over the Fermat curve x^h+y^h=1. Several urn models, including a classical one associated with balanced trees (2-3 trees and fringe-balanced search trees) and related to a previous study of Panholzer and Prodinger, as well as all urns of balance 1 or 2 and a sporadic urn of balance 3, are shown to admit of explicit representations in terms of Weierstra\ss elliptic functions: these elliptic models appear precisely to correspond to regular tessellations of the Euclidean plane.Comment: Published at http://dx.doi.org/10.1214/009117905000000026 in the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org

    No Bits Left Behind

    Get PDF
    One of the key tenets of database system design is making efficient use of storage and memory resources. However, existing database system implementations are actually extremely wasteful of such resources; for example, most systems leave a great deal of empty space in tuples, index pages, and data pages, and spend many CPU cycles reading cold records from disk that are never used. In this paper, we identify a number of such sources of waste, and present a series of techniques that limit this waste (e.g., forcing better memory locality for hot data and using empty space in index pages to cache popular tuples) without substantially complicating interfaces or system design. We show that these techniques effectively reduce memory requirements for real scenarios from the Wikipedia database (by up to 17.8×) while increasing query performance (by up to 8×)

    On Optimal Balance in B-Trees: What Does It Cost to Stay in Perfect Shape?

    Get PDF
    Any B-tree has height at least ceil[log_B(n)]. Static B-trees achieving this height are easy to build. In the dynamic case, however, standard B-tree rebalancing algorithms only maintain a height within a constant factor of this optimum. We investigate exactly how close to ceil[log_B(n)] the height of dynamic B-trees can be maintained as a function of the rebalancing cost. In this paper, we prove a lower bound on the cost of maintaining optimal height ceil[log_B(n)], which shows that this cost must increase from Omega(1/B) to Omega(n/B) rebalancing per update as n grows from one power of B to the next. We also provide an almost matching upper bound, demonstrating this lower bound to be essentially tight. We then give a variant upper bound which can maintain near-optimal height at low cost. As two special cases, we can maintain optimal height for all but a vanishing fraction of values of n using Theta(log_B(n)) amortized rebalancing cost per update and we can maintain a height of optimal plus one using O(1/B) amortized rebalancing cost per update. More generally, for any rebalancing budget, we can maintain (as n grows from one power of B to the next) optimal height essentially up to the point where the lower bound requires the budget to be exceeded, after which optimal height plus one is maintained. Finally, we prove that this balancing scheme gives B-trees with very good storage utilization

    Online Data Structures in External Memory

    Get PDF
    The original publication is available at www.springerlink.comThe data sets for many of today's computer applications are too large to t within the computer's internal memory and must instead be stored on external storage devices such as disks. A major performance bottleneck can be the input/output communication (or I/O) between the external and internal memories. In this paper we discuss a variety of online data structures for external memory, some very old and some very new, such as hashing (for dictionaries), B-trees (for dictionaries and 1-D range search), bu er trees (for batched dynamic problems), interval trees with weight-balanced B-trees (for stabbing queries), priority search trees (for 3-sided 2-D range search), and R-trees and other spatial structures. We also discuss several open problems along the way

    Space saving generalization of B-trees with 23 utilization

    Get PDF
    AbstractThe paper studies balanced trees with variable length records. It generalizes the concept of B-tree with unfixed key length introduced in [1] and S(1)-tree of [2]. The main property of the new trees, called S(b)-trees, is their local incompressibility. That is, any sequence consisting of b + 1 neighboring nodes of the tree cannot be compressed into a b well formed node. The case of S(2)-trees is studied in detail. For these trees, 23 − ε utilization lower bound is proven, where ε is inversely proportional to the tree branching. Logarithmic running time algorithms for search, insertion, and deletion are presented
    corecore