19,585 research outputs found
Optimal Binary Search Trees with Near Minimal Height
Suppose we have n keys, n access probabilities for the keys, and n+1 access
probabilities for the gaps between the keys. Let h_min(n) be the minimal height
of a binary search tree for n keys. We consider the problem to construct an
optimal binary search tree with near minimal height, i.e.\ with height h <=
h_min(n) + Delta for some fixed Delta. It is shown, that for any fixed Delta
optimal binary search trees with near minimal height can be constructed in time
O(n^2). This is as fast as in the unrestricted case.
So far, the best known algorithms for the construction of height-restricted
optimal binary search trees have running time O(L n^2), whereby L is the
maximal permitted height. Compared to these algorithms our algorithm is at
least faster by a factor of log n, because L is lower bounded by log n
Optimal Hierarchical Layouts for Cache-Oblivious Search Trees
This paper proposes a general framework for generating cache-oblivious
layouts for binary search trees. A cache-oblivious layout attempts to minimize
cache misses on any hierarchical memory, independent of the number of memory
levels and attributes at each level such as cache size, line size, and
replacement policy. Recursively partitioning a tree into contiguous subtrees
and prescribing an ordering amongst the subtrees, Hierarchical Layouts
generalize many commonly used layouts for trees such as in-order, pre-order and
breadth-first. They also generalize the various flavors of the van Emde Boas
layout, which have previously been used as cache-oblivious layouts.
Hierarchical Layouts thus unify all previous attempts at deriving layouts for
search trees.
The paper then derives a new locality measure (the Weighted Edge Product)
that mimics the probability of cache misses at multiple levels, and shows that
layouts that reduce this measure perform better. We analyze the various degrees
of freedom in the construction of Hierarchical Layouts, and investigate the
relative effect of each of these decisions in the construction of
cache-oblivious layouts. Optimizing the Weighted Edge Product for complete
binary search trees, we introduce the MinWEP layout, and show that it
outperforms previously used cache-oblivious layouts by almost 20%.Comment: Extended version with proofs added to the appendi
Tree Compression with Top Trees Revisited
We revisit tree compression with top trees (Bille et al, ICALP'13) and
present several improvements to the compressor and its analysis. By
significantly reducing the amount of information stored and guiding the
compression step using a RePair-inspired heuristic, we obtain a fast compressor
achieving good compression ratios, addressing an open problem posed by Bille et
al. We show how, with relatively small overhead, the compressed file can be
converted into an in-memory representation that supports basic navigation
operations in worst-case logarithmic time without decompression. We also show a
much improved worst-case bound on the size of the output of top-tree
compression (answering an open question posed in a talk on this algorithm by
Weimann in 2012).Comment: SEA 201
Recommended from our members
A survey of induction algorithms for machine learning
Central to all systems for machine learning from examples is an induction algorithm. The purpose of the algorithm is to generalize from a finite set of training examples a description consistent with the examples seen, and, hopefully, with the potentially infinite set of examples not seen. This paper surveys four machine learning induction algorithms. The knowledge representation schemes and a PDL description of algorithm control are emphasized. System characteristics that are peculiar to a domain of application are de-emphasized. Finally, a comparative summary of the learning algorithms is presented
Finger Search in Grammar-Compressed Strings
Grammar-based compression, where one replaces a long string by a small
context-free grammar that generates the string, is a simple and powerful
paradigm that captures many popular compression schemes. Given a grammar, the
random access problem is to compactly represent the grammar while supporting
random access, that is, given a position in the original uncompressed string
report the character at that position. In this paper we study the random access
problem with the finger search property, that is, the time for a random access
query should depend on the distance between a specified index , called the
\emph{finger}, and the query index . We consider both a static variant,
where we first place a finger and subsequently access indices near the finger
efficiently, and a dynamic variant where also moving the finger such that the
time depends on the distance moved is supported.
Let be the size the grammar, and let be the size of the string. For
the static variant we give a linear space representation that supports placing
the finger in time and subsequently accessing in time,
where is the distance between the finger and the accessed index. For the
dynamic variant we give a linear space representation that supports placing the
finger in time and accessing and moving the finger in time. Compared to the best linear space solution to random
access, we improve a query bound to for the static
variant and to for the dynamic variant, while
maintaining linear space. As an application of our results we obtain an
improved solution to the longest common extension problem in grammar compressed
strings. To obtain our results, we introduce several new techniques of
independent interest, including a novel van Emde Boas style decomposition of
grammars
A survey of max-type recursive distributional equations
In certain problems in a variety of applied probability settings (from
probabilistic analysis of algorithms to statistical physics), the central
requirement is to solve a recursive distributional equation of the form X =^d
g((\xi_i,X_i),i\geq 1). Here (\xi_i) and g(\cdot) are given and the X_i are
independent copies of the unknown distribution X. We survey this area,
emphasizing examples where the function g(\cdot) is essentially a ``maximum''
or ``minimum'' function. We draw attention to the theoretical question of
endogeny: in the associated recursive tree process X_i, are the X_i measurable
functions of the innovations process (\xi_i)?Comment: Published at http://dx.doi.org/10.1214/105051605000000142 in the
Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute
of Mathematical Statistics (http://www.imstat.org
- …