12,267 research outputs found

    Scalable Image Retrieval by Sparse Product Quantization

    Get PDF
    Fast Approximate Nearest Neighbor (ANN) search technique for high-dimensional feature indexing and retrieval is the crux of large-scale image retrieval. A recent promising technique is Product Quantization, which attempts to index high-dimensional image features by decomposing the feature space into a Cartesian product of low dimensional subspaces and quantizing each of them separately. Despite the promising results reported, their quantization approach follows the typical hard assignment of traditional quantization methods, which may result in large quantization errors and thus inferior search performance. Unlike the existing approaches, in this paper, we propose a novel approach called Sparse Product Quantization (SPQ) to encoding the high-dimensional feature vectors into sparse representation. We optimize the sparse representations of the feature vectors by minimizing their quantization errors, making the resulting representation is essentially close to the original data in practice. Experiments show that the proposed SPQ technique is not only able to compress data, but also an effective encoding technique. We obtain state-of-the-art results for ANN search on four public image datasets and the promising results of content-based image retrieval further validate the efficacy of our proposed method.Comment: 12 page

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

    Postprocessing can speed up general quantum search algorithms

    Full text link
    A general quantum search algorithm aims to evolve a quantum system from a known source state s|s\rangle to an unknown target state t|t\rangle. It uses a diffusion operator DsD_{s} having source state as one of its eigenstates and ItI_{t}, where IψI_{\psi} denotes the selective phase inversion of ψ|\psi\rangle state. It evolves s|s\rangle to a particular state w|w\rangle, call it w-state, in O(B/α)O(B/\alpha) time steps where α\alpha is ts|\langle t|s\rangle| and BB is a characteristic of the diffusion operator. Measuring the w-state gives the target state with the success probability of O(1/B2)O(1/B^{2}) and O(B2)O(B^{2}) applications of the algorithm can boost it from O(1/B2)O(1/B^{2}) to O(1)O(1), making the total time complexity O(B3/α)O(B^{3}/\alpha). In the special case of Grover's algorithm, DsD_{s} is IsI_{s} and BB is very close to 11. A more efficient way to boost the success probability is quantum amplitude amplification provided we can efficiently implement IwI_{w}. Such an efficient implementation is not known so far. In this paper, we present an efficient algorithm to approximate selective phase inversions of the unknown eigenstates of an operator using phase estimation algorithm. This algorithm is used to efficiently approximate IwI_{w} which reduces the time complexity of general algorithm to O(B/α)O(B/\alpha). Though O(B/α)O(B/\alpha) algorithms are known to exist, our algorithm offers physical implementation advantages.Comment: Accepted for publication in Physical Review A. arXiv admin note: substantial text overlap with arXiv:1210.464

    Revisiting Kernelized Locality-Sensitive Hashing for Improved Large-Scale Image Retrieval

    Full text link
    We present a simple but powerful reinterpretation of kernelized locality-sensitive hashing (KLSH), a general and popular method developed in the vision community for performing approximate nearest-neighbor searches in an arbitrary reproducing kernel Hilbert space (RKHS). Our new perspective is based on viewing the steps of the KLSH algorithm in an appropriately projected space, and has several key theoretical and practical benefits. First, it eliminates the problematic conceptual difficulties that are present in the existing motivation of KLSH. Second, it yields the first formal retrieval performance bounds for KLSH. Third, our analysis reveals two techniques for boosting the empirical performance of KLSH. We evaluate these extensions on several large-scale benchmark image retrieval data sets, and show that our analysis leads to improved recall performance of at least 12%, and sometimes much higher, over the standard KLSH method.Comment: 15 page
    corecore