11 research outputs found

    Hardness of Bichromatic Closest Pair with Jaccard Similarity

    Get PDF
    Consider collections A\mathcal{A} and B\mathcal{B} of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from A×B\mathcal{A}\times \mathcal{B} that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity ∣a∩b∣/∣a∪b∣|\textbf{a}\cap \textbf{b}|/|\textbf{a}\cup \textbf{b}| for (a,b)∈A×B(\textbf{a},\textbf{b})\in \mathcal{A}\times \mathcal{B}. We consider the approximate version of the problem where we are given thresholds j1>j2j_1>j_2 and wish to return a pair from A×B\mathcal{A}\times \mathcal{B} that has Jaccard similarity higher than j2j_2 if there exists a pair in A×B\mathcal{A}\times \mathcal{B} with Jaccard similarity at least j1j_1. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC '98), instantiated with the MinHash LSH function of Broder et al., solves this problem in O~(n2−δ)\tilde O(n^{2-\delta}) time if j1≥j21−δj_1\ge j_2^{1-\delta}. In particular, for δ=Ω(1)\delta=\Omega(1), the approximation ratio j1/j2=1/j2δj_1/j_2=1/j_2^{\delta} increases polynomially in 1/j21/j_2. In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in O(n2−Ω(1))O(n^{2-\Omega(1)}) time for j1/j2=1/j2o(1)j_1/j_2=1/j_2^{o(1)}. Specifically, assuming OVC, we prove that for any δ>0\delta>0 there exists an ε>0\varepsilon>0 such that Bichromatic Closest Pair with Jaccard similarity requires time Ω(n2−δ)\Omega(n^{2-\delta}) for any choice of thresholds j2<j1<1−δj_2<j_1<1-\delta, that satisfy j1≤j21−εj_1\le j_2^{1-\varepsilon}

    27th Annual European Symposium on Algorithms: ESA 2019, September 9-11, 2019, Munich/Garching, Germany

    Get PDF

    Reverse Thinking in Spatial Queries

    Full text link
    In recent years, an increasing number of researches are conducted on spatial queries regarding the influence of query objects. Among these queries, reverse k nearest neighbors (RkNN) query is the one studied the most extensively. Reverse k furthest neighbors (RkFN) queries is the natural complement of RkNN queries. RkNN query is introduced to reflect the influence of the query object. Since this representation is intuitive, RkNN query has attracted significant attention among the database community. Later, reverse top-k queries was introduced, and also used extensively to represent influence. In many scenarios, when we consider the influence of an spatial object, reverse thinking is involved. That is, whether an object is influential to another object is depending on how the other object assess this object, other than how this object considers the other object. In this thesis, we study three problems involves reverse thinking. We first study the problem of efficiently computing RkFN queries. We are the first to propose a solution for arbitrary value of k. Based on several interesting observations, we present an efficient algorithm to process the RkFN queries. We also present a rigorous theoretical analysis to study various important aspects of the problem and our algorithm. An extensive experimental study demonstrates that our algorithm outperforms the state-of-the-art algorithm even for k=1. The accuracy of our theoretical analysis is also verified. We then study the problem of selecting set of representative products considering both diversity and coverage based on reverse top-k queries. Since this problem is NP-hard, we employ a greedy algorithm. We adopt MinHash and KMV Synopses to assist set operations. Our experimental study demonstrates the performance of the proposed algorithm. We also study the problem of maximizing spatial influence of facility bundle based on RkNN queries. We are the first to study this problem. We prove its NP-hardness, and propose a branch-and-bound best first search algorithm that greedily select the currently best facility until we get the required number of facilities. We introduce the concept of kNN region. It allows us to avoid redundant calculation with dynamic programming technique. Experiments show that our algorithm is orders of magnitudes better than our baseline algorithm

    Recommendation Support for Multi-Attribute Databases

    Get PDF

    Differential Privacy in Distributed Settings

    Get PDF

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum
    corecore