Search CORE

78,660 research outputs found

High-Dimensional Spatio-Temporal Indexing

Author: Elke Pulvermüller
Martin Breunig
Mathias Menninghaus
Publication venue: RonPub
Publication date: 01/01/2016
Field of study

There exist numerous indexing methods which handle either spatio-temporal or high-dimensional data well. However, those indexing methods which handle spatio-temporal data well have certain drawbacks when confronted with high-dimensional data. As the most efficient spatio-temporal indexing methods are based on the R-tree and its variants, they face the well known problems in high-dimensional space. Furthermore, most high-dimensional indexing methods try to reduce the number of dimensions in the data being indexed and compress the information given by all dimensions into few dimensions but are not able to store now - relative data. One of the most efficient high-dimensional indexing methods, the Pyramid Technique, is able to handle high-dimensional point-data only. Nonetheless, we take this technique and extend it such that it is able to handle spatio-temporal data as well. We introduce a technique for querying in this structure with spatio-temporal queries. We compare our technique, the Spatio-Temporal Pyramid Adapter (STPA), to the RST-tree for in-memory and on-disk applications. We show that for high dimensions, the extra query-cost for reducing the dimensionality in the Pyramid Technique is clearly exceeded by the rising query-cost in the RST-tree. Concluding, we address the main drawbacks and advantages of our technique

RonPub -- Research Online Publishing

HDIdx: High-Dimensional Indexing for Efficient Approximate Nearest Neighbor Search

Author: Hoi Steven C. H.
Li Jintao
Tang Sheng
Wan Ji
Wu Pengcheng
Zhang Yongdong
Publication venue: 'Elsevier BV'
Publication date: 07/10/2015
Field of study

Fast Nearest Neighbor (NN) search is a fundamental challenge in large-scale data processing and analytics, particularly for analyzing multimedia contents which are often of high dimensionality. Instead of using exact NN search, extensive research efforts have been focusing on approximate NN search algorithms. In this work, we present "HDIdx", an efficient high-dimensional indexing library for fast approximate NN search, which is open-source and written in Python. It offers a family of state-of-the-art algorithms that convert input high-dimensional vectors into compact binary codes, making them very efficient and scalable for NN search with very low space complexity

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

High-Dimensional Indexing for Video Retrieval

Author: Catalin Calistru
Cristina Ribeiro
Gabriel David
Publication venue: 'IntechOpen'
Publication date: 01/01/2012
Field of study

IntechOpen

Repositório Aberto da Universidade do Porto

Angle Tree: Nearest Neighbor Search in High Dimensions with Low Intrinsic Dimensionality

Author: Chawla Sanjay
Zvedeniouk Ilia
Publication venue
Publication date: 01/01/2010
Field of study

We propose an extension of tree-based space-partitioning indexing structures for data with low intrinsic dimensionality embedded in a high dimensional space. We call this extension an Angle Tree. Our extension can be applied to both classical kd-trees as well as the more recent rp-trees. The key idea of our approach is to store the angle (the "dihedral angle") between the data region (which is a low dimensional manifold) and the random hyperplane that splits the region (the "splitter"). We show that the dihedral angle can be used to obtain a tight lower bound on the distance between the query point and any point on the opposite side of the splitter. This in turn can be used to efficiently prune the search space. We introduce a novel randomized strategy to efficiently calculate the dihedral angle with a high degree of accuracy. Experiments and analysis on real and synthetic data sets shows that the Angle Tree is the most efficient known indexing structure for nearest neighbor queries in terms of preprocessing and space usage while achieving high accuracy and fast search time.Comment: To be submitted to IEEE Transactions on Pattern Analysis and Machine Intelligenc

arXiv.org e-Print Archive

CiteSeerX

Incremental dimension reduction of tensors with random index

Author: B Emruli
Blerim Emruli
D Achlioptas
DM Kane
E Velldal
Fredrik Sandin
I Fronza
J Karlgren
J Matoušek
K Lund
M Baroni
M Berry
M Sahlgren
M Wan
Magnus Sahlgren
MWM Boyd
N Goel
N Halko
P Frankl
P Kanerva
P Kanerva
PD Turney
RG Baraniuk
S Dasgupta
S Deerwester
Science Staff
SS Vempala
T Cohen
T Cohen
TG Kolda
TK Landauer
V Vasuki
W Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/03/2011
Field of study

We present an incremental, scalable and efficient dimension reduction technique for tensors that is based on sparse random linear coding. Data is stored in a compactified representation with fixed size, which makes memory requirements low and predictable. Component encoding and decoding are performed on-line without computationally expensive re-analysis of the data set. The range of tensor indices can be extended dynamically without modifying the component representation. This idea originates from a mathematical model of semantic memory and a method known as random indexing in natural language processing. We generalize the random-indexing algorithm to tensors and present signal-to-noise-ratio simulations for representations of vectors and matrices. We present also a mathematical analysis of the approximate orthogonality of high-dimensional ternary vectors, which is a property that underpins this and other similar random-coding approaches to dimension reduction. To further demonstrate the properties of random indexing we present results of a synonym identification task. The method presented here has some similarities with random projection and Tucker decomposition, but it performs well at high dimensionality only (n>10^3). Random indexing is useful for a range of complex practical problems, e.g., in natural language processing, data mining, pattern recognition, event detection, graph searching and search engines. Prototype software is provided. It supports encoding and decoding of tensors of order >= 1 in a unified framework, i.e., vectors, matrices and higher order tensors.Comment: 36 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Luleå University of Technology Publications

HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces

Author: Arora Akhil
Bhattacharya Arnab
Kumar Piyush
Sinha Sakshi
Publication venue: 'VLDB Endowment'
Publication date: 23/04/2018
Field of study

Nearest neighbor searching of large databases in high-dimensional spaces is inherently difficult due to the curse of dimensionality. A flavor of approximation is, therefore, necessary to practically solve the problem of nearest neighbor search. In this paper, we propose a novel yet simple indexing scheme, HD-Index, to solve the problem of approximate k-nearest neighbor queries in massive high-dimensional databases. HD-Index consists of a set of novel hierarchical structures called RDB-trees built on Hilbert keys of database objects. The leaves of the RDB-trees store distances of database objects to reference objects, thereby allowing efficient pruning using distance filters. In addition to triangular inequality, we also use Ptolemaic inequality to produce better lower bounds. Experiments on massive (up to billion scale) high-dimensional (up to 1000+) datasets show that HD-Index is effective, efficient, and scalable.Comment: PVLDB 11(8):906-919, 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Searching in one billion vectors: re-rank with source coding

Author: Amsaleg Laurent
Douze Matthijs
Jégou Hervé
Tavenard Romain
Publication venue
Publication date: 01/01/2011
Field of study

Recent indexing techniques inspired by source coding have been shown successful to index billions of high-dimensional vectors in memory. In this paper, we propose an approach that re-ranks the neighbor hypotheses obtained by these compressed-domain indexing methods. In contrast to the usual post-verification scheme, which performs exact distance calculation on the short-list of hypotheses, the estimated distances are refined based on short quantization codes, to avoid reading the full vectors from disk. We have released a new public dataset of one billion 128-dimensional vectors and proposed an experimental setup to evaluate high dimensional indexing algorithms on a realistic scale. Experiments show that our method accurately and efficiently re-ranks the neighbor hypotheses using little memory compared to the full vectors representation.Comment: International Conference on Acoustics, Speech and Signal Processing, Prague : Czech Republic (2011

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL-Rennes 1