22,529 research outputs found
Main Memory Adaptive Indexing for Multi-core Systems
Adaptive indexing is a concept that considers index creation in databases as
a by-product of query processing; as opposed to traditional full index creation
where the indexing effort is performed up front before answering any queries.
Adaptive indexing has received a considerable amount of attention, and several
algorithms have been proposed over the past few years; including a recent
experimental study comparing a large number of existing methods. Until now,
however, most adaptive indexing algorithms have been designed single-threaded,
yet with multi-core systems already well established, the idea of designing
parallel algorithms for adaptive indexing is very natural. In this regard only
one parallel algorithm for adaptive indexing has recently appeared in the
literature: The parallel version of standard cracking. In this paper we
describe three alternative parallel algorithms for adaptive indexing, including
a second variant of a parallel standard cracking algorithm. Additionally, we
describe a hybrid parallel sorting algorithm, and a NUMA-aware method based on
sorting. We then thoroughly compare all these algorithms experimentally; along
a variant of a recently published parallel version of radix sort. Parallel
sorting algorithms serve as a realistic baseline for multi-threaded adaptive
indexing techniques. In total we experimentally compare seven parallel
algorithms. Additionally, we extensively profile all considered algorithms. The
initial set of experiments considered in this paper indicates that our parallel
algorithms significantly improve over previously known ones. Our results
suggest that, although adaptive indexing algorithms are a good design choice in
single-threaded environments, the rules change considerably in the parallel
case. That is, in future highly-parallel environments, sorting algorithms could
be serious alternatives to adaptive indexing.Comment: 26 pages, 7 figure
Reordering Rows for Better Compression: Beyond the Lexicographic Order
Sorting database tables before compressing them improves the compression
rate. Can we do better than the lexicographical order? For minimizing the
number of runs in a run-length encoding compression scheme, the best approaches
to row-ordering are derived from traveling salesman heuristics, although there
is a significant trade-off between running time and compression. A new
heuristic, Multiple Lists, which is a variant on Nearest Neighbor that trades
off compression for a major running-time speedup, is a good option for very
large tables. However, for some compression schemes, it is more important to
generate long runs rather than few runs. For this case, another novel
heuristic, Vortex, is promising. We find that we can improve run-length
encoding up to a factor of 3 whereas we can improve prefix coding by up to 80%:
these gains are on top of the gains due to lexicographically sorting the table.
We prove that the new row reordering is optimal (within 10%) at minimizing the
runs of identical values within columns, in a few cases.Comment: to appear in ACM TOD
Just Sort It! A Simple and Effective Approach to Active Preference Learning
We address the problem of learning a ranking by using adaptively chosen
pairwise comparisons. Our goal is to recover the ranking accurately but to
sample the comparisons sparingly. If all comparison outcomes are consistent
with the ranking, the optimal solution is to use an efficient sorting
algorithm, such as Quicksort. But how do sorting algorithms behave if some
comparison outcomes are inconsistent with the ranking? We give favorable
guarantees for Quicksort for the popular Bradley-Terry model, under natural
assumptions on the parameters. Furthermore, we empirically demonstrate that
sorting algorithms lead to a very simple and effective active learning
strategy: repeatedly sort the items. This strategy performs as well as
state-of-the-art methods (and much better than random sampling) at a minuscule
fraction of the computational cost.Comment: Accepted at ICML 201
Analysis of Algorithms for Permutations Biased by Their Number of Records
The topic of the article is the parametric study of the complexity of
algorithms on arrays of pairwise distinct integers. We introduce a model that
takes into account the non-uniformness of data, which we call the Ewens-like
distribution of parameter for records on permutations: the weight
of a permutation depends on its number of records. We show that
this model is meaningful for the notion of presortedness, while still being
mathematically tractable. Our results describe the expected value of several
classical permutation statistics in this model, and give the expected running
time of three algorithms: the Insertion Sort, and two variants of the Min-Max
search
- …