24 research outputs found
B-urns
The fringe of a B-tree with parameter is considered as a particular
P\'olya urn with colors. More precisely, the asymptotic behaviour of this
fringe, when the number of stored keys tends to infinity, is studied through
the composition vector of the fringe nodes. We establish its typical behaviour
together with the fluctuations around it. The well known phase transition in
P\'olya urns has the following effect on B-trees: for , the
fluctuations are asymptotically Gaussian, though for , the
composition vector is oscillating; after scaling, the fluctuations of such an
urn strongly converge to a random variable . This limit is -valued and it does not seem to follow any classical law. Several properties
of are shown: existence of exponential moments, characterization of its
distribution as the solution of a smoothing equation, existence of a density
relatively to the Lebesgue measure on , support of . Moreover, a
few representations of the composition vector for various values of
illustrate the different kinds of convergence
Analytic urns
This article describes a purely analytic approach to urn models of the
generalized or extended P\'olya-Eggenberger type, in the case of two types of
balls and constant ``balance,'' that is, constant row sum. The treatment starts
from a quasilinear first-order partial differential equation associated with a
combinatorial renormalization of the model and bases itself on elementary
conformal mapping arguments coupled with singularity analysis techniques.
Probabilistic consequences in the case of ``subtractive'' urns are new
representations for the probability distribution of the urn's composition at
any time n, structural information on the shape of moments of all orders,
estimates of the speed of convergence to the Gaussian limit and an explicit
determination of the associated large deviation function. In the general case,
analytic solutions involve Abelian integrals over the Fermat curve x^h+y^h=1.
Several urn models, including a classical one associated with balanced trees
(2-3 trees and fringe-balanced search trees) and related to a previous study of
Panholzer and Prodinger, as well as all urns of balance 1 or 2 and a sporadic
urn of balance 3, are shown to admit of explicit representations in terms of
Weierstra\ss elliptic functions: these elliptic models appear precisely to
correspond to regular tessellations of the Euclidean plane.Comment: Published at http://dx.doi.org/10.1214/009117905000000026 in the
Annals of Probability (http://www.imstat.org/aop/) by the Institute of
Mathematical Statistics (http://www.imstat.org
No Bits Left Behind
One of the key tenets of database system design is making efficient
use of storage and memory resources. However, existing database
system implementations are actually extremely wasteful of such
resources; for example, most systems leave a great deal of empty
space in tuples, index pages, and data pages, and spend many
CPU cycles reading cold records from disk that are never used.
In this paper, we identify a number of such sources of waste, and
present a series of techniques that limit this waste (e.g., forcing
better memory locality for hot data and using empty space in index
pages to cache popular tuples) without substantially complicating
interfaces or system design. We show that these techniques
effectively reduce memory requirements for real scenarios from
the Wikipedia database (by up to 17.8×) while increasing query
performance (by up to 8×)
On Optimal Balance in B-Trees: What Does It Cost to Stay in Perfect Shape?
Any B-tree has height at least ceil[log_B(n)]. Static B-trees achieving this height are easy to build. In the dynamic case, however, standard B-tree rebalancing algorithms only maintain a height within a constant factor of this optimum. We investigate exactly how close to ceil[log_B(n)] the height of dynamic B-trees can be maintained as a function of the rebalancing cost. In this paper, we prove a lower bound on the cost of maintaining optimal height ceil[log_B(n)], which shows that this cost must increase from Omega(1/B) to Omega(n/B) rebalancing per update as n grows from one power of B to the next. We also provide an almost matching upper bound, demonstrating this lower bound to be essentially tight. We then give a variant upper bound which can maintain near-optimal height at low cost. As two special cases, we can maintain optimal height for all but a vanishing fraction of values of n using Theta(log_B(n)) amortized rebalancing cost per update and we can maintain a height of optimal plus one using O(1/B) amortized rebalancing cost per update. More generally, for any rebalancing budget, we can maintain (as n grows from one power of B to the next) optimal height essentially up to the point where the lower bound requires the budget to be exceeded, after which optimal height plus one is maintained. Finally, we prove that this balancing scheme gives B-trees with very good storage utilization
Online Data Structures in External Memory
The original publication is available at www.springerlink.comThe data sets for many of today's computer applications are
too large to t within the computer's internal memory and must instead
be stored on external storage devices such as disks. A major performance
bottleneck can be the input/output communication (or I/O) between
the external and internal memories. In this paper we discuss a variety of
online data structures for external memory, some very old and some very
new, such as hashing (for dictionaries), B-trees (for dictionaries and 1-D
range search), bu er trees (for batched dynamic problems), interval trees
with weight-balanced B-trees (for stabbing queries), priority search trees
(for 3-sided 2-D range search), and R-trees and other spatial structures.
We also discuss several open problems along the way
Space saving generalization of B-trees with 23 utilization
AbstractThe paper studies balanced trees with variable length records. It generalizes the concept of B-tree with unfixed key length introduced in [1] and S(1)-tree of [2]. The main property of the new trees, called S(b)-trees, is their local incompressibility. That is, any sequence consisting of b + 1 neighboring nodes of the tree cannot be compressed into a b well formed node. The case of S(2)-trees is studied in detail. For these trees, 23 − ε utilization lower bound is proven, where ε is inversely proportional to the tree branching. Logarithmic running time algorithms for search, insertion, and deletion are presented