53 research outputs found
Hardness of Bichromatic Closest Pair with Jaccard Similarity
Consider collections and of red and blue sets,
respectively. Bichromatic Closest Pair is the problem of finding a pair from
that has similarity higher than a given
threshold according to some similarity measure. Our focus here is the classic
Jaccard similarity
for .
We consider the approximate version of the problem where we are given
thresholds and wish to return a pair from that has Jaccard similarity higher than if there exists a
pair in with Jaccard similarity at least .
The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani
(STOC '98), instantiated with the MinHash LSH function of Broder et al., solves
this problem in time if . In
particular, for , the approximation ratio
increases polynomially in .
In this paper we give a corresponding hardness result. Assuming the
Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general
solution that solves the Bichromatic Closest Pair problem in
time for . Specifically, assuming
OVC, we prove that for any there exists an such that
Bichromatic Closest Pair with Jaccard similarity requires time
for any choice of thresholds , that
satisfy
Pseudorandom Hashing for Space-bounded Computation with Applications in Streaming
We revisit Nisan's classical pseudorandom generator (PRG) for space-bounded
computation (STOC 1990) and its applications in streaming algorithms. We
describe a new generator, HashPRG, that can be thought of as a symmetric
version of Nisan's generator over larger alphabets. Our generator allows a
trade-off between seed length and the time needed to compute a given block of
the generator's output. HashPRG can be used to obtain derandomizations with
much better update time and \emph{without sacrificing space} for a large number
of data stream algorithms, such as estimation in the parameter regimes and and CountSketch with tight estimation guarantees as
analyzed by Minton and Price (SODA 2014) which assumed access to a random
oracle. We also show a recent analysis of Private CountSketch can be
derandomized using our techniques.
For a -dimensional vector being updated in a turnstile stream, we show
that can be estimated up to an additive error of
using
bits of space. Additionally, the update time of this algorithm is in the Word RAM model. We show that the space complexity of
this algorithm is optimal up to constant factors. However, for vectors with
, we show that the lower bound can be
broken by giving an algorithm that uses bits of
space which approximates up to an additive error of
. We use our aforementioned derandomization of the
CountSketch data structure to obtain this algorithm, and using the time-space
trade off of HashPRG, we show that the update time of this algorithm is also
in the Word RAM model.Comment: Minor writing improvement
Udenfor museet - indenfor murene
Artiklen fremstiller et aktionsforskningsinspireret formidlingsprojekt (Rødder) i Storstrøm statsfængsle, der gennem en serie prøvehandlinger eksperimenterer med kulturhistoriske formidling til fængslets indsatte. På baggrund af 5 kvalitative interview med indsatte diskuteres museets rolle som dannelsesaktør og hvordan viden om fortiden kan blive et fælles tredje i unge voksnes samtaler om identitet og tilknytning
Udenfor museet - indenfor murene
Artiklen fremstiller et aktionsforskningsinspireret formidlingsprojekt (Rødder) i Storstrøm statsfængsle, der gennem en serie prøvehandlinger eksperimenterer med kulturhistoriske formidling til fængslets indsatte. På baggrund af 5 kvalitative interview med indsatte diskuteres museets rolle som dannelsesaktør og hvordan viden om fortiden kan blive et fælles tredje i unge voksnes samtaler om identitet og tilknytning
Triangle Counting in Dynamic Graph Streams
Estimating the number of triangles in graph streams using a limited amount of
memory has become a popular topic in the last decade. Different variations of
the problem have been studied, depending on whether the graph edges are
provided in an arbitrary order or as incidence lists. However, with a few
exceptions, the algorithms have considered {\em insert-only} streams. We
present a new algorithm estimating the number of triangles in {\em dynamic}
graph streams where edges can be both inserted and deleted. We show that our
algorithm achieves better time and space complexity than previous solutions for
various graph classes, for example sparse graphs with a relatively small number
of triangles. Also, for graphs with constant transitivity coefficient, a common
situation in real graphs, this is the first algorithm achieving constant
processing time per edge. The result is achieved by a novel approach combining
sampling of vertex triples and sparsification of the input graph. In the course
of the analysis of the algorithm we present a lower bound on the number of
pairwise independent 2-paths in general graphs which might be of independent
interest. At the end of the paper we discuss lower bounds on the space
complexity of triangle counting algorithms that make no assumptions on the
structure of the graph.Comment: New version of a SWAT 2014 paper with improved result
- …