Search CORE

4,485 research outputs found

Indexing Metric Spaces for Exact Similarity Search

Author: Chen Lu
Gao Yunjun
Jensen Christian S.
Li Zheng
Miao Xiaoye
Song Xuan
Zhu Yifan
Publication venue
Publication date: 07/05/2020
Field of study

With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes

arXiv.org e-Print Archive

VBN

An Efficient Representation for Filtrations of Simplicial Complexes

Author: Boissonnat Jean-Daniel
S. Karthik C.
Publication venue
Publication date: 16/01/2017
Field of study

A filtration over a simplicial complex

K

is an ordering of the simplices of

K

such that all prefixes in the ordering are subcomplexes of

K

. Filtrations are at the core of Persistent Homology, a major tool in Topological Data Analysis. In order to represent the filtration of a simplicial complex, the entire filtration can be appended to any data structure that explicitly stores all the simplices of the complex such as the Hasse diagram or the recently introduced Simplex Tree [Algorithmica '14]. However, with the popularity of various computational methods that need to handle simplicial complexes, and with the rapidly increasing size of the complexes, the task of finding a compact data structure that can still support efficient queries is of great interest. In this paper, we propose a new data structure called the Critical Simplex Diagram (CSD) which is a variant of the Simplex Array List (SAL) [Algorithmica '17]. Our data structure allows one to store in a compact way the filtration of a simplicial complex, and allows for the efficient implementation of a large range of basic operations. Moreover, we prove that our data structure is essentially optimal with respect to the requisite storage space. Finally, we show that the CSD representation admits fast construction algorithms for Flag complexes and relaxed Delaunay complexes.Comment: A preliminary version appeared in SODA 201

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

The Flexible Group Spatial Keyword Query

Author: D Papadias
G Cong
GR Hjaltason
K Yao
ME Ali
N Roussopoulos
X Cao
Z Li
Publication venue
Publication date: 24/04/2017
Field of study

We present a new class of service for location based social networks, called the Flexible Group Spatial Keyword Query, which enables a group of users to collectively find a point of interest (POI) that optimizes an aggregate cost function combining both spatial distances and keyword similarities. In addition, our query service allows users to consider the tradeoffs between obtaining a sub-optimal solution for the entire group and obtaining an optimimized solution but only for a subgroup. We propose algorithms to process three variants of the query: (i) the group nearest neighbor with keywords query, which finds a POI that optimizes the aggregate cost function for the whole group of size n, (ii) the subgroup nearest neighbor with keywords query, which finds the optimal subgroup and a POI that optimizes the aggregate cost function for a given subgroup size m (m <= n), and (iii) the multiple subgroup nearest neighbor with keywords query, which finds optimal subgroups and corresponding POIs for each of the subgroup sizes in the range [m, n]. We design query processing algorithms based on branch-and-bound and best-first paradigms. Finally, we provide theoretical bounds and conduct extensive experiments with two real datasets which verify the effectiveness and efficiency of the proposed algorithms.Comment: 12 page

arXiv.org e-Print Archive

Crossref

Intelligent Data Storage and Retrieval for Design Optimisation – an Overview

Author: Bil C.
Drack L.
Peebles C.
Publication venue: 'Czech Technical University in Prague - Central Library'
Publication date: 01/01/2005
Field of study

This paper documents the findings of a literature review conducted by the Sir Lawrence Wackett Centre for Aerospace Design Technology at RMIT University. The review investigates aspects of a proposed system for intelligent design optimisation. Such a system would be capable of efficiently storing (and compressing if required) a range of types of design data into an intelligent database. This database would be accessed by the system during subsequent design processes, allowing for search of relevant design data for re-use in later designs, allowing it to become very efficient in reducing the time for later designs as the database grows in size. Extensive research has been performed, in both theoretical aspects of the project, and practical examples of current similar systems. This research covers the areas of database systems, database queries, representation and compression of design data, geometric representation and heuristic methods for design applications.

Directory of Open Access Journals

CTU Open Journal Systems (Czech Technical University, Prague / České vysoké učení technické v Praze)

Discrete Multi-modal Hashing with Canonical Views for Robust Mobile Landmark Search

Author: He Xiangnan
Huang Zi
Liu Xiaobai
Song Jingkuan
Zhou Xiaofang
Zhu Lei
Publication venue
Publication date: 13/07/2017
Field of study

Mobile landmark search (MLS) recently receives increasing attention for its great practical values. However, it still remains unsolved due to two important challenges. One is high bandwidth consumption of query transmission, and the other is the huge visual variations of query images sent from mobile devices. In this paper, we propose a novel hashing scheme, named as canonical view based discrete multi-modal hashing (CV-DMH), to handle these problems via a novel three-stage learning procedure. First, a submodular function is designed to measure visual representativeness and redundancy of a view set. With it, canonical views, which capture key visual appearances of landmark with limited redundancy, are efficiently discovered with an iterative mining strategy. Second, multi-modal sparse coding is applied to transform visual features from multiple modalities into an intermediate representation. It can robustly and adaptively characterize visual contents of varied landmark images with certain canonical views. Finally, compact binary codes are learned on intermediate representation within a tailored discrete binary embedding model which preserves visual relations of images measured with canonical views and removes the involved noises. In this part, we develop a new augmented Lagrangian multiplier (ALM) based optimization method to directly solve the discrete binary codes. We can not only explicitly deal with the discrete constraint, but also consider the bit-uncorrelated constraint and balance constraint together. Experiments on real world landmark datasets demonstrate the superior performance of CV-DMH over several state-of-the-art methods

arXiv.org e-Print Archive

University of Queensland eSpace