20 research outputs found
Smooth heaps and a dual view of self-adjusting data structures
We present a new connection between self-adjusting binary search trees (BSTs)
and heaps, two fundamental, extensively studied, and practically relevant
families of data structures. Roughly speaking, we map an arbitrary heap
algorithm within a natural model, to a corresponding BST algorithm with the
same cost on a dual sequence of operations (i.e. the same sequence with the
roles of time and key-space switched). This is the first general transformation
between the two families of data structures.
There is a rich theory of dynamic optimality for BSTs (i.e. the theory of
competitiveness between BST algorithms). The lack of an analogous theory for
heaps has been noted in the literature. Through our connection, we transfer all
instance-specific lower bounds known for BSTs to a general model of heaps,
initiating a theory of dynamic optimality for heaps.
On the algorithmic side, we obtain a new, simple and efficient heap
algorithm, which we call the smooth heap. We show the smooth heap to be the
heap-counterpart of Greedy, the BST algorithm with the strongest proven and
conjectured properties from the literature, widely believed to be
instance-optimal. Assuming the optimality of Greedy, the smooth heap is also
optimal within our model of heap algorithms. As corollaries of results known
for Greedy, we obtain instance-specific upper bounds for the smooth heap, with
applications in adaptive sorting.
Intriguingly, the smooth heap, although derived from a non-practical BST
algorithm, is simple and easy to implement (e.g. it stores no auxiliary data
besides the keys and tree pointers). It can be seen as a variation on the
popular pairing heap data structure, extending it with a "power-of-two-choices"
type of heuristic.Comment: Presented at STOC 2018, light revision, additional figure
New Paths from Splay to Dynamic Optimality
Consider the task of performing a sequence of searches in a binary search
tree. After each search, an algorithm is allowed to arbitrarily restructure the
tree, at a cost proportional to the amount of restructuring performed. The cost
of an execution is the sum of the time spent searching and the time spent
optimizing those searches with restructuring operations. This notion was
introduced by Sleator and Tarjan in (JACM, 1985), along with an algorithm and a
conjecture. The algorithm, Splay, is an elegant procedure for performing
adjustments while moving searched items to the top of the tree. The conjecture,
called "dynamic optimality," is that the cost of splaying is always within a
constant factor of the optimal algorithm for performing searches. The
conjecture stands to this day. In this work, we attempt to lay the foundations
for a proof of the dynamic optimality conjecture.Comment: An earlier version of this work appeared in the Proceedings of the
Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms. arXiv admin note:
text overlap with arXiv:1907.0630
The PGM-index: a multicriteria, compressed and learned approach to data indexing
The recent introduction of learned indexes has shaken the foundations of the
decades-old field of indexing data structures. Combining, or even replacing,
classic design elements such as B-tree nodes with machine learning models has
proven to give outstanding improvements in the space footprint and time
efficiency of data systems. However, these novel approaches are based on
heuristics, thus they lack any guarantees both in their time and space
requirements. We propose the Piecewise Geometric Model index (shortly,
PGM-index), which achieves guaranteed I/O-optimality in query operations,
learns an optimal number of linear models, and its peculiar recursive
construction makes it a purely learned data structure, rather than a hybrid of
traditional and learned indexes (such as RMI and FITing-tree). We show that the
PGM-index improves the space of the FITing-tree by 63.3% and of the B-tree by
more than four orders of magnitude, while achieving their same or even better
query time efficiency. We complement this result by proposing three variants of
the PGM-index. First, we design a compressed PGM-index that further reduces its
space footprint by exploiting the repetitiveness at the level of the learned
linear models it is composed of. Second, we design a PGM-index that adapts
itself to the distribution of the queries, thus resulting in the first known
distribution-aware learned index to date. Finally, given its flexibility in the
offered space-time trade-offs, we propose the multicriteria PGM-index that
efficiently auto-tune itself in a few seconds over hundreds of millions of keys
to the possibly evolving space-time constraints imposed by the application of
use.
We remark to the reader that this paper is an extended and improved version
of our previous paper titled "Superseding traditional indexes by orchestrating
learning and geometry" (arXiv:1903.00507).Comment: We remark to the reader that this paper is an extended and improved
version of our previous paper titled "Superseding traditional indexes by
orchestrating learning and geometry" (arXiv:1903.00507
SpK: A fast atomic and microphysics code for the high-energy-density regime
SpK is part of the numerical codebase at Imperial College London used to model high energy density physics (HEDP) experiments. SpK is an efficient atomic and microphysics code used to perform detailed configuration accounting calculations of electronic and ionic stage populations, opacities and emissivities for use in post-processing and radiation hydrodynamics simulations. This is done using screened hydrogenic atomic data supplemented by the NIST energy level database. An extended Saha model solves for chemical equilibrium with extensions for non-ideal physics, such as ionisation potential depression, and non thermal equilibrium corrections. A tree-heap (treap) data structure is used to store spectral data, such as opacity, which is dynamic thus allowing easy insertion of points around spectral lines without a-priori knowledge of the ion stage populations. Results from SpK are compared to other codes and descriptions of radiation transport solutions which use SpK data are given. The treap data structure and SpK’s computational efficiency allows inline post-processing of 3D hydrodynamics simulations with a dynamically evolving spectrum stored in a treap
Analysis and solution of different algorithmic problems
The goal of competitive programming is being able to find abstract solutions
for some given algorithmic problems, and and also being able to code those
ideas into an efficient and correct computer program. Performing this activity
at a high level requires a bit of natural ability, (at least) hundreds of training
hours, and a wide range of knowledge, obviously including many algorithms
and data structures, some of them not trivial at all.
This project constitutes a compilation of problems from several different
relevant topics in competitive programming, with an explanation and analysis
of their solution. Most of these problems were solved while training with the
UPC programming teams, which have dominated their regional competition
for more than one decade.
The author hopes that this collection may eventually increase the interest
of some readers towards competitive programming
Ranked Queries in Index Data Structures
A ranked query is a query which returns the top-ranking elements of a set, sorted by rank, where the rank corresponds to some sort of preference function defined on the items of the set. This thesis investigates the problem of adding rank query capabilities to several index data structures on top of their existing functionality. First, we introduce the concept of rank-sensitive data structures, based on the existing concept of output-sensitive data structures. Rank-sensitive data structures are output-sensitive data structures which are additionally given a ranking of the items stored and as a result of a query return only the k best-ranking items satisfying the given query, sorted according to rank, where k is specified at query time. We explore several ways of adding rank-sensitivity to different data structures and the different trade-offs which this incurs. The second part of the work deals with the first efficient dynamic version of the Cartesian tree – a data structure intrinsically related to rank queries