11 research outputs found
Hardness of Bichromatic Closest Pair with Jaccard Similarity
Consider collections and of red and blue sets,
respectively. Bichromatic Closest Pair is the problem of finding a pair from
that has similarity higher than a given
threshold according to some similarity measure. Our focus here is the classic
Jaccard similarity
for .
We consider the approximate version of the problem where we are given
thresholds and wish to return a pair from that has Jaccard similarity higher than if there exists a
pair in with Jaccard similarity at least .
The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani
(STOC '98), instantiated with the MinHash LSH function of Broder et al., solves
this problem in time if . In
particular, for , the approximation ratio
increases polynomially in .
In this paper we give a corresponding hardness result. Assuming the
Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general
solution that solves the Bichromatic Closest Pair problem in
time for . Specifically, assuming
OVC, we prove that for any there exists an such that
Bichromatic Closest Pair with Jaccard similarity requires time
for any choice of thresholds , that
satisfy
Reverse Thinking in Spatial Queries
In recent years, an increasing number of researches are conducted on spatial queries regarding the influence of query objects. Among these queries, reverse k nearest neighbors (RkNN) query is the one studied the most extensively. Reverse k furthest neighbors (RkFN) queries is the natural complement of RkNN queries. RkNN query is introduced to reflect the influence of the query object. Since this representation is intuitive, RkNN query has attracted significant attention among the database community. Later, reverse top-k queries was introduced, and also used extensively to represent influence. In many scenarios, when we consider the influence of an spatial object, reverse thinking is involved. That is, whether an object is influential to another object is depending on how the other object assess this object, other than how this object considers the other object. In this thesis, we study three problems involves reverse thinking.
We first study the problem of efficiently computing RkFN queries. We are the first to propose a solution for arbitrary value of k. Based on several interesting observations, we present an efficient algorithm to process the RkFN queries. We also present a rigorous theoretical analysis to study various important aspects of the problem and our algorithm. An extensive experimental study demonstrates that our algorithm outperforms the state-of-the-art algorithm even for k=1. The accuracy of our theoretical analysis is also verified.
We then study the problem of selecting set of representative products considering both diversity and coverage based on reverse top-k queries. Since this problem is NP-hard, we employ a greedy algorithm. We adopt MinHash and KMV Synopses to assist set operations. Our experimental study demonstrates the performance of the proposed algorithm.
We also study the problem of maximizing spatial influence of facility bundle based on RkNN queries. We are the first to study this problem. We prove its NP-hardness, and propose a branch-and-bound best first search algorithm that greedily select the currently best facility until we get the required number of facilities. We introduce the concept of kNN region. It allows us to avoid redundant calculation with dynamic programming technique. Experiments show that our algorithm is orders of magnitudes better than our baseline algorithm
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum