632 research outputs found
A Queueing Analysis of Hashing With Lazy Deletion
Hashing with lazy deletion is a simple method for maintaining a dynamic dictionary: items are inserted and sought as usual in a separate-chaining hash table; however, items that no longer need to be in the data structure remain until a later insertion operation stumbles on them and removes them from the table. Because hashing with lazy deletion does not delete items as soon as possible, it keeps more items in the dictionary than methods that use more careful deletion strategies. On the other hand, its space overhead is much smaller than those more careful methods, so if the number of extra items is not too large, hashing with lazy deletion can be a practical algorithm when space is scarce. In this paper, we analyze the expected amount of excess space used by hashing with lazy deletion
The Maximum Size of Dynamic Data Structures
This paper develops two probabilistic methods that allow the analysis of the maximum data
structure size encountered during a sequence of insertions and deletions in data structures such as priority queues, dictionaries, linear lists, and symbol tables, and in sweepline structures for geometry and Very-Large-Scale-Integration (VLSI) applications. The notion of the "maximum" is basic to issues of resource preallocation. The methods here are applied to combinatorial models of file histories and probabilistic models, as well as to a non-Markovian process (algorithm) for processing sweepline information in an efficient way, called "hashing with lazy deletion" (HwLD). Expressions are derived for the expected maximum data
structure size that are asymptotically exact, that is, correct up to lower-order terms; in several cases of interest the expected value of the maximum size is asymptotically equal to the maximum expected size. This solves several open problems, including longstanding questions in queueing theory. Both of these approaches are robust and rely upon novel applications of techniques from the analysis of algorithms. At a high level, the first method isolates the primary contribution to the maximum and bounds the lesser effects. In the second technique the continuous-time probabilistic model is related to its discrete analog--the maximum slot occupancy in hashing
Sorting using complete subintervals and the maximum number of runs in a randomly evolving sequence
We study the space requirements of a sorting algorithm where only items that
at the end will be adjacent are kept together. This is equivalent to the
following combinatorial problem: Consider a string of fixed length n that
starts as a string of 0's, and then evolves by changing each 0 to 1, with then
changes done in random order. What is the maximal number of runs of 1's?
We give asymptotic results for the distribution and mean. It turns out that,
as in many problems involving a maximum, the maximum is asymptotically normal,
with fluctuations of order n^{1/2}, and to the first order well approximated by
the number of runs at the instance when the expectation is maximized, in this
case when half the elements have changed to 1; there is also a second order
term of order n^{1/3}.
We also treat some variations, including priority queues. The proofs use
methods originally developed for random graphs.Comment: 31 PAGE
Logical Algorithms meets CHR: A meta-complexity result for Constraint Handling Rules with rule priorities
This paper investigates the relationship between the Logical Algorithms
language (LA) of Ganzinger and McAllester and Constraint Handling Rules (CHR).
We present a translation schema from LA to CHR-rp: CHR with rule priorities,
and show that the meta-complexity theorem for LA can be applied to a subset of
CHR-rp via inverse translation. Inspired by the high-level implementation
proposal for Logical Algorithm by Ganzinger and McAllester and based on a new
scheduling algorithm, we propose an alternative implementation for CHR-rp that
gives strong complexity guarantees and results in a new and accurate
meta-complexity theorem for CHR-rp. It is furthermore shown that the
translation from Logical Algorithms to CHR-rp combined with the new CHR-rp
implementation, satisfies the required complexity for the Logical Algorithms
meta-complexity result to hold.Comment: To appear in Theory and Practice of Logic Programming (TPLP
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
- …