Search CORE

503 research outputs found

Intrinsic Dimensionality

Author: Pestov Vladimir
Publication venue
Publication date: 01/01/2010
Field of study

This entry for the SIGSPATIAL Special July 2010 issue on Similarity Searching in Metric Spaces discusses the notion of intrinsic dimensionality of data in the context of similarity search.Comment: 4 pages, 4 figures, latex; diagram (c) has been correcte

arXiv.org e-Print Archive

CiteSeerX

A hybrid data structure for searching in metric spaces

Author: Chávez Edgar
Herrera Norma Edith
Reyes Nora Susana
Publication venue
Publication date: 01/05/2004
Field of study

The concept of “approximate” searching has applications in a vast number of fields. Some examples are non-traditional databases (e. g. storing images, fingerprints or audio clips, where the concept of exact search is of no use and we search instead for similar objects), text searching, information retrieval, machine learning and classification, image quantization and compression, computational biology, and function prediction.Eje: Base de datosRed de Universidades con Carreras en Informática (RedUNCI

New Approaches to Similarity Searching in Metric Spaces

Author: celik cengiz
Publication venue
Publication date: 24/04/2006
Field of study

The complex and unstructured nature of many types of data, such as multimedia objects, text documents, protein sequences, requires the use of similarity search techniques for retrieval of information from databases. One popular approach for similarity searching is mapping database objects into feature vectors, which introduces an undesirable element of indirection into the process. A more direct approach is to define a distance function directly between objects. Typically such a function is taken from a metric space, which satisfies a number of properties, such as the triangle inequality. Index structures that can work for metric spaces have been shown to provide satisfactory performance, and were reported to outperform vector-based counterparts in many applications. Metric spaces also provide a more general framework, and for some domains defining a distance between objects can be accomplished more intuitively than mapping objects to feature vectors. In this thesis we will investigate new efficient methods for similarity searching in metric spaces. We will first show that current solutions to indexing in metric spaces have several drawbacks. Tree-based solutions do not provide the best tradeoffs between construction time and query performance. Tree structures are also difficult to make dynamic without further degrading their performance. There is also a family of flat structures that address some of the deficiencies of tree-based indices, but they introduce their own unique problems in terms of higher construction cost, higher space usage, and extra CPU overhead. In this thesis a new family of flat structures will be introduced, which are very flexible and simple. We will show that dynamic operations can easily be performed, and that they can be customized to work under different performance requirements. They also address many of the general drawbacks of flat structures as outlined above. A new framework, composite metrics will also be introduced, which provides a more flexible similarity searching process by allowing several metrics to be combined in one search structure. Two indexing structures will be introduced that can handle similarity queries in this setting, and it will be shown that they provide competitive query performance with respect to data structures for standard metrics

Digital Repository at the University of Maryland

A hybrid data structure for searching in metric spaces

Author: Chávez Edgar
Herrera Norma Edith
Reyes Nora Susana
Publication venue
Publication date: 20/09/2012
Field of study

Servicio de Difusión de la Creación Intelectual

Properties of embedding methods for similarity searching in metric spaces

Author: G.R. Hjaltason
H. Samet
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Searching in metric spaces with user-defined and approximate distances

Author: Ankerst M.
Bartolini I.
Berchtold S.
Berretti S.
Brinkhoff T.
Börzsönyi S.
Chakrabarti K.
Chiueh T.
Ciaccia P.
Ciaccia P.
Ciaccia P.
Figueira Santos Filho R.
Goldstein J.
Hjaltason G. R.
Ishikawa Y.
Kahveci T.
Kieβling W.
Marco Patella
Ortega-Binderberger M.
Paolo Ciaccia
Rubner Y.
Sakurai Y.
Seeger B.
Seidl T.
Traina C. Jr.
Traina C. Jr.
Yi B.-K.
Çetintemel U.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Fully dynamic and memory-adaptative spatial approximation trees

Author: Arroyuelo Diego
Navarro Gonzalo
Reyes Nora Susana
Publication venue
Publication date: 01/10/2003
Field of study

Hybrid dynamic spatial approximation trees are recently proposed data structures for searching in metric spaces, based on combining the concepts of spatial approximation and pivot based algorithms. These data structures are hybrid schemes, with the full features of dynamic spatial approximation trees and able of using the available memory to improve the query time. It has been shown that they compare favorably against alternative data structures in spaces of medium difficulty. In this paper we complete and improve hybrid dynamic spatial approximation trees, by presenting a new search alternative, an algorithm to remove objects from the tree, and an improved way of managing the available memory. The result is a fully dynamic and optimized data structure for similarity searching in metric spaces.Eje: Teoría (TEOR)Red de Universidades con Carreras en Informática (RedUNCI

Efficient Document Indexing Using Pivot Tree

Author: Piwowarski Benjamin
Singh Gaurav
Publication venue
Publication date: 01/05/2016
Field of study

We present a novel method for efficiently searching top-k neighbors for documents represented in high dimensional space of terms based on the cosine similarity. Mostly, documents are stored as bag-of-words tf-idf representation. One of the most used ways of computing similarity between a pair of documents is cosine similarity between the vector representations, but cosine similarity is not a metric distance measure as it doesn't follow triangle inequality, therefore most metric searching methods can not be applied directly. We propose an efficient method for indexing documents using a pivot tree that leads to efficient retrieval. We also study the relation between precision and efficiency for the proposed method and compare it with a state of the art in the area of document searching based on inner product.Comment: 6 Pages, 2 Figure

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

Indexing Metric Spaces for Exact Similarity Search

Author: Chen Lu
Gao Yunjun
Jensen Christian S.
Li Zheng
Miao Xiaoye
Song Xuan
Zhu Yifan
Publication venue
Publication date: 07/05/2020
Field of study

With the continued digitalization of societal processes, we are seeing an explosion in available data. This is referred to as big data. In a research setting, three aspects of the data are often viewed as the main sources of challenges when attempting to enable value creation from big data: volume, velocity and variety. Many studies address volume or velocity, while much fewer studies concern the variety. Metric space is ideal for addressing variety because it can accommodate any type of data as long as its associated distance notion satisfies the triangle inequality. To accelerate search in metric space, a collection of indexing techniques for metric data have been proposed. However, existing surveys each offers only a narrow coverage, and no comprehensive empirical study of those techniques exists. We offer a survey of all the existing metric indexes that can support exact similarity search, by i) summarizing all the existing partitioning, pruning and validation techniques used for metric indexes, ii) providing the time and storage complexity analysis on the index construction, and iii) report on a comprehensive empirical comparison of their similarity query processing performance. Here, empirical comparisons are used to evaluate the index performance during search as it is hard to see the complexity analysis differences on the similarity query processing and the query performance depends on the pruning and validation abilities related to the data distribution. This article aims at revealing different strengths and weaknesses of different indexing techniques in order to offer guidance on selecting an appropriate indexing technique for a given setting, and directing the future research for metric indexes

arXiv.org e-Print Archive

VBN