50 research outputs found

### Faster Clustering via Preprocessing

We examine the efficiency of clustering a set of points, when the
encompassing metric space may be preprocessed in advance. In computational
problems of this genre, there is a first stage of preprocessing, whose input is
a collection of points $M$; the next stage receives as input a query set
$Q\subset M$, and should report a clustering of $Q$ according to some
objective, such as 1-median, in which case the answer is a point $a\in M$
minimizing $\sum_{q\in Q} d_M(a,q)$.
We design fast algorithms that approximately solve such problems under
standard clustering objectives like $p$-center and $p$-median, when the metric
$M$ has low doubling dimension. By leveraging the preprocessing stage, our
algorithms achieve query time that is near-linear in the query size $n=|Q|$,
and is (almost) independent of the total number of points $m=|M|$.Comment: 24 page

### A Simple Algorithm for Approximating the Text-To-Pattern Hamming Distance

The algorithmic task of computing the Hamming distance between a given pattern of length m and each location in a text of length n, both over a general alphabet Sigma, is one of the most fundamental algorithmic tasks in string algorithms. The fastest known runtime for exact computation is tilde O(nsqrt m). We recently introduced a complicated randomized algorithm for obtaining a (1 +/- eps) approximation for each location in the text in O( (n/eps) log(1/eps) log n log m log |Sigma|) total time, breaking a barrier that stood for 22 years. In this paper, we introduce an elementary and simple randomized algorithm that takes O((n/eps) log n log m) time

### Color-Distance Oracles and Snippets

In the snippets problem we are interested in preprocessing a text T so that given two pattern queries P_1 and P_2, one can quickly locate the occurrences of the patterns in T that are the closest to each other. A closely related problem is that of constructing a color-distance oracle, where the goal is to preprocess a set of points from some metric space, in which every point is associated with a set of colors, so that given two colors one can quickly locate two points associated with those colors, that are as close as possible to each other.
We introduce efficient data structures for both color-distance oracles and the snippets problem. Moreover, we prove conditional lower bounds for these problems from both the 3SUM conjecture and the Combinatorial Boolean Matrix Multiplication conjecture

### Answering Spatial Multiple-Set Intersection Queries Using 2-3 Cuckoo Hash-Filters

We show how to answer spatial multiple-set intersection queries in O(n(log
w)/w + kt) expected time, where n is the total size of the t sets involved in
the query, w is the number of bits in a memory word, k is the output size, and
c is any fixed constant. This improves the asymptotic performance over previous
solutions and is based on an interesting data structure, known as 2-3 cuckoo
hash-filters. Our results apply in the word-RAM model (or practical RAM model),
which allows for constant-time bit-parallel operations, such as bitwise AND,
OR, NOT, and MSB (most-significant 1-bit), as exist in modern CPUs and GPUs.
Our solutions apply to any multiple-set intersection queries in spatial data
sets that can be reduced to one-dimensional range queries, such as spatial join
queries for one-dimensional points or sets of points stored along space-filling
curves, which are used in GIS applications.Comment: Full version of paper from 2017 ACM SIGSPATIAL International
Conference on Advances in Geographic Information System

### Selection in the Presence of Memory Faults, with Applications to In-place Resilient Sorting

The selection problem, where one wishes to locate the $k^{th}$ smallest
element in an unsorted array of size $n$, is one of the basic problems studied
in computer science. The main focus of this work is designing algorithms for
solving the selection problem in the presence of memory faults. These can
happen as the result of cosmic rays, alpha particles, or hardware failures.
Specifically, the computational model assumed here is a faulty variant of the
RAM model (abbreviated as FRAM), which was introduced by Finocchi and Italiano.
In this model, the content of memory cells might get corrupted adversarially
during the execution, and the algorithm is given an upper bound $\delta$ on the
number of corruptions that may occur.
The main contribution of this work is a deterministic resilient selection
algorithm with optimal O(n) worst-case running time. Interestingly, the running
time does not depend on the number of faults, and the algorithm does not need
to know $\delta$.
The aforementioned resilient selection algorithm can be used to improve the
complexity bounds for resilient $k$-d trees developed by Gieseke, Moruz and
Vahrenhold. Specifically, the time complexity for constructing a $k$-d tree is
improved from $O(n\log^2 n + \delta^2)$ to $O(n \log n)$.
Besides the deterministic algorithm, a randomized resilient selection
algorithm is developed, which is simpler than the deterministic one, and has
$O(n + \alpha)$ expected time complexity and O(1) space complexity (i.e., is
in-place). This algorithm is used to develop the first resilient sorting
algorithm that is in-place and achieves optimal $O(n\log n + \alpha\delta)$
expected running time.Comment: 26 page