40 research outputs found
In pursuit of the dynamic optimality conjecture
In 1985, Sleator and Tarjan introduced the splay tree, a self-adjusting
binary search tree algorithm. Splay trees were conjectured to perform within a
constant factor as any offline rotation-based search tree algorithm on every
sufficiently long sequence---any binary search tree algorithm that has this
property is said to be dynamically optimal. However, currently neither splay
trees nor any other tree algorithm is known to be dynamically optimal. Here we
survey the progress that has been made in the almost thirty years since the
conjecture was first formulated, and present a binary search tree algorithm
that is dynamically optimal if any binary search tree algorithm is dynamically
optimal.Comment: Preliminary version of paper to appear in the Conference on Space
Efficient Data Structures, Streams and Algorithms to be held in August 2013
in honor of Ian Munro's 66th birthda
On merging the fields of neural networks and adaptive data structures to yield new pattern recognition methodologies
The aim of this talk is to explain a pioneering exploratory research endeavour that attempts to merge two completely different fields in Computer Science so as to yield very fascinating results. These are the well-established fields of Neural Networks (NNs) and Adaptive Data Structures (ADS) respectively. The field of NNs deals with the training and learning capabilities of a large number of neurons, each possessing minimal computational properties. On the other hand, the field of ADS concerns designing, implementing and analyzing data structures which adaptively change with time so as to optimize some access criteria. In this talk, we shall demonstrate how these fields can be merged, so that the neural elements are themselves linked together using a data structure. This structure can be a singly-linked or doubly-linked list, or even a Binary Search Tree (BST). While the results themselves are quite generic, in particular, we shall, as a prima facie case, present the results in which a Self-Organizing Map (SOM) with an underlying BST structure can be adaptively re-structured using conditional rotations. These rotations on the nodes of the tree are local and are performed in constant time, guaranteeing a decrease in the Weighted Path Length of the entire tree. As a result, the algorithm, referred to as the Tree-based Topology-Oriented SOM with Conditional Rotations (TTO-CONROT), converges in such a manner that the neurons are ultimately placed in the input space so as to represent its stochastic distribution. Besides, the neighborhood properties of the neurons suit the best BST that represents the data
Learning-Augmented B-Trees
We study learning-augmented binary search trees (BSTs) and B-Trees via Treaps
with composite priorities. The result is a simple search tree where the depth
of each item is determined by its predicted weight . To achieve the
result, each item has its composite priority
where is the uniform
random variable. This generalizes the recent learning-augmented BSTs
[Lin-Luo-Woodruff ICML`22], which only work for Zipfian distributions, to
arbitrary inputs and predictions. It also gives the first B-Tree data structure
that can provably take advantage of localities in the access sequence via
online self-reorganization. The data structure is robust to prediction errors
and handles insertions, deletions, as well as prediction updates.Comment: 25 page
Locally Self-Adjusting Skip Graphs
We present a distributed self-adjusting algorithm for skip graphs that
minimizes the average routing costs between arbitrary communication pairs by
performing topological adaptation to the communication pattern. Our algorithm
is fully decentralized, conforms to the model (i.e. uses
bit messages), and requires bits of memory for each
node, where is the total number of nodes. Upon each communication request,
our algorithm first establishes communication by using the standard skip graph
routing, and then locally and partially reconstructs the skip graph topology to
perform topological adaptation. We propose a computational model for such
algorithms, as well as a yardstick (working set property) to evaluate them. Our
working set property can also be used to evaluate self-adjusting algorithms for
other graph classes where multiple tree-like subgraphs overlap (e.g. hypercube
networks). We derive a lower bound of the amortized routing cost for any
algorithm that follows our model and serves an unknown sequence of
communication requests. We show that the routing cost of our algorithm is at
most a constant factor more than the amortized routing cost of any algorithm
conforming to our computational model. We also show that the expected
transformation cost for our algorithm is at most a logarithmic factor more than
the amortized routing cost of any algorithm conforming to our computational
model
On Resource Pooling and Separation for LRU Caching
Caching systems using the Least Recently Used (LRU) principle have now become
ubiquitous. A fundamental question for these systems is whether the cache space
should be pooled together or divided to serve multiple flows of data item
requests in order to minimize the miss probabilities. In this paper, we show
that there is no straight yes or no answer to this question, depending on
complex combinations of critical factors, including, e.g., request rates,
overlapped data items across different request flows, data item popularities
and their sizes. Specifically, we characterize the asymptotic miss
probabilities for multiple competing request flows under resource pooling and
separation for LRU caching when the cache size is large.
Analytically, we show that it is asymptotically optimal to jointly serve
multiple flows if their data item sizes and popularity distributions are
similar and their arrival rates do not differ significantly; the
self-organizing property of LRU caching automatically optimizes the resource
allocation among them asymptotically. Otherwise, separating these flows could
be better, e.g., when data sizes vary significantly. We also quantify critical
points beyond which resource pooling is better than separation for each of the
flows when the overlapped data items exceed certain levels. Technically, we
generalize existing results on the asymptotic miss probability of LRU caching
for a broad class of heavy-tailed distributions and extend them to multiple
competing flows with varying data item sizes, which also validates the Che
approximation under certain conditions. These results provide new insights on
improving the performance of caching systems
Combining Binary Search Trees
We present a general transformation for combining a constant number of binary search tree data structures (BSTs) into a single BST whose running time is within a constant factor of the minimum of any āwell-behavedā bound on the running time of the given BSTs, for any online access sequence. (A BST has a well-behaved bound with f(n) overhead if it spends at most O(f(n)) time per access and its bound satisfies a weak sense of closure under subsequences.) In particular, we obtain a BST data structure that is O(loglogn) competitive, satisfies the working set bound (and thus satisfies the static finger bound and the static optimality bound), satisfies the dynamic finger bound, satisfies the unified bound with an additive O(loglogn) factor, and performs each access in worst-case O(logn) time
Self-Improving Algorithms
We investigate ways in which an algorithm can improve its expected
performance by fine-tuning itself automatically with respect to an unknown
input distribution D. We assume here that D is of product type. More precisely,
suppose that we need to process a sequence I_1, I_2, ... of inputs I = (x_1,
x_2, ..., x_n) of some fixed length n, where each x_i is drawn independently
from some arbitrary, unknown distribution D_i. The goal is to design an
algorithm for these inputs so that eventually the expected running time will be
optimal for the input distribution D = D_1 * D_2 * ... * D_n.
We give such self-improving algorithms for two problems: (i) sorting a
sequence of numbers and (ii) computing the Delaunay triangulation of a planar
point set. Both algorithms achieve optimal expected limiting complexity. The
algorithms begin with a training phase during which they collect information
about the input distribution, followed by a stationary regime in which the
algorithms settle to their optimized incarnations.Comment: 26 pages, 8 figures, preliminary versions appeared at SODA 2006 and
SoCG 2008. Thorough revision to improve the presentation of the pape