Search CORE

11 research outputs found

Hardness of Bichromatic Closest Pair with Jaccard Similarity

Author: Nielsen Nina Mesing Stausholm
Pagh Rasmus
Thorup Mikkel
Publication venue
Publication date: 01/01/2019
Field of study

Consider collections

\mathcal{A}

and

\mathcal{B}

of red and blue sets, respectively. Bichromatic Closest Pair is the problem of finding a pair from

\mathcal{A}\times \mathcal{B}

that has similarity higher than a given threshold according to some similarity measure. Our focus here is the classic Jaccard similarity

|\textbf{a}\cap \textbf{b}|/|\textbf{a}\cup \textbf{b}|

for

(\textbf{a},\textbf{b})\in \mathcal{A}\times \mathcal{B}

. We consider the approximate version of the problem where we are given thresholds

j_1>j_2

and wish to return a pair from

\mathcal{A}\times \mathcal{B}

that has Jaccard similarity higher than

j_2

if there exists a pair in

\mathcal{A}\times \mathcal{B}

with Jaccard similarity at least

j_1

. The classic locality sensitive hashing (LSH) algorithm of Indyk and Motwani (STOC '98), instantiated with the MinHash LSH function of Broder et al., solves this problem in

\tilde O(n^{2-\delta})

time if

j_1\ge j_2^{1-\delta}

. In particular, for

\delta=\Omega(1)

, the approximation ratio

j_1/j_2=1/j_2^{\delta}

increases polynomially in

1/j_2

. In this paper we give a corresponding hardness result. Assuming the Orthogonal Vectors Conjecture (OVC), we show that there cannot be a general solution that solves the Bichromatic Closest Pair problem in

O(n^{2-\Omega(1)})

time for

j_1/j_2=1/j_2^{o(1)}

. Specifically, assuming OVC, we prove that for any

\delta>0

there exists an

\varepsilon>0

such that Bichromatic Closest Pair with Jaccard similarity requires time

\Omega(n^{2-\delta})

for any choice of thresholds

j_2<j_1<1-\delta

, that satisfy

j_1\le j_2^{1-\varepsilon}

arXiv.org e-Print Archive

Copenhagen University Research Information System

Dagstuhl Research Online Publication Server

The IT University of Copenhagen's Repository

27th Annual European Symposium on Algorithms: ESA 2019, September 9-11, 2019, Munich/Garching, Germany

Author: ESA <27. 2019, München>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik GmbH, Dagstuhl Publishing
Publication date: 01/09/2019
Field of study

Digitale Bibliothek Thüringen

Similarity Search: Algorithms for Sets and other High Dimensional Data

Author: Ahle Thomas Dybdahl
Publication venue
Publication date: 01/01/2019
Field of study

The IT University of Copenhagen's Repository

Reverse Thinking in Spatial Queries

Author: Wang Shenlu
Publication venue: UNSW, Sydney
Publication date: 01/01/2018
Field of study

In recent years, an increasing number of researches are conducted on spatial queries regarding the influence of query objects. Among these queries, reverse k nearest neighbors (RkNN) query is the one studied the most extensively. Reverse k furthest neighbors (RkFN) queries is the natural complement of RkNN queries. RkNN query is introduced to reflect the influence of the query object. Since this representation is intuitive, RkNN query has attracted significant attention among the database community. Later, reverse top-k queries was introduced, and also used extensively to represent influence. In many scenarios, when we consider the influence of an spatial object, reverse thinking is involved. That is, whether an object is influential to another object is depending on how the other object assess this object, other than how this object considers the other object. In this thesis, we study three problems involves reverse thinking. We first study the problem of efficiently computing RkFN queries. We are the first to propose a solution for arbitrary value of k. Based on several interesting observations, we present an efficient algorithm to process the RkFN queries. We also present a rigorous theoretical analysis to study various important aspects of the problem and our algorithm. An extensive experimental study demonstrates that our algorithm outperforms the state-of-the-art algorithm even for k=1. The accuracy of our theoretical analysis is also verified. We then study the problem of selecting set of representative products considering both diversity and coverage based on reverse top-k queries. Since this problem is NP-hard, we employ a greedy algorithm. We adopt MinHash and KMV Synopses to assist set operations. Our experimental study demonstrates the performance of the proposed algorithm. We also study the problem of maximizing spatial influence of facility bundle based on RkNN queries. We are the first to study this problem. We prove its NP-hardness, and propose a branch-and-bound best first search algorithm that greedily select the currently best facility until we get the required number of facilities. We introduce the concept of kNN region. It allows us to avoid redundant calculation with dynamic programming technique. Experiments show that our algorithm is orders of magnitudes better than our baseline algorithm

UNSWorks

Recommendation Support for Multi-Attribute Databases

Author: ZHANG Jilian
Publication venue: Singapore Management University
Publication date: 01/06/2014
Field of study

Institutional Knowledge at Singapore Management University

Differential Privacy in Distributed Settings

Author: Nielsen Nina Mesing Stausholm
Publication venue: IT-Universitetet i København
Publication date: 01/01/2021
Field of study

The IT University of Copenhagen's Repository

LIPIcs, Volume 251, ITCS 2023, Complete Volume

Author: Tauman Kalai Yael
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 14th Innovations in Theoretical Computer Science Conference (ITCS 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 251, ITCS 2023, Complete Volum

Dagstuhl Research Online Publication Server

Advances in database technology - EDBT 2016: 19th International Conference on Extending Database Technology, Bordeaux, France, March 15-18, 2016 : proceedings

Author
Publication venue: University of Konstanz, University Library
Publication date: 01/01/2016
Field of study

Digitale Bibliothek Thüringen