2,213 research outputs found
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
Revisiting Kernelized Locality-Sensitive Hashing for Improved Large-Scale Image Retrieval
We present a simple but powerful reinterpretation of kernelized
locality-sensitive hashing (KLSH), a general and popular method developed in
the vision community for performing approximate nearest-neighbor searches in an
arbitrary reproducing kernel Hilbert space (RKHS). Our new perspective is based
on viewing the steps of the KLSH algorithm in an appropriately projected space,
and has several key theoretical and practical benefits. First, it eliminates
the problematic conceptual difficulties that are present in the existing
motivation of KLSH. Second, it yields the first formal retrieval performance
bounds for KLSH. Third, our analysis reveals two techniques for boosting the
empirical performance of KLSH. We evaluate these extensions on several
large-scale benchmark image retrieval data sets, and show that our analysis
leads to improved recall performance of at least 12%, and sometimes much
higher, over the standard KLSH method.Comment: 15 page
- …