346,922 research outputs found

    HDIdx: High-Dimensional Indexing for Efficient Approximate Nearest Neighbor Search

    Get PDF
    Fast Nearest Neighbor (NN) search is a fundamental challenge in large-scale data processing and analytics, particularly for analyzing multimedia contents which are often of high dimensionality. Instead of using exact NN search, extensive research efforts have been focusing on approximate NN search algorithms. In this work, we present "HDIdx", an efficient high-dimensional indexing library for fast approximate NN search, which is open-source and written in Python. It offers a family of state-of-the-art algorithms that convert input high-dimensional vectors into compact binary codes, making them very efficient and scalable for NN search with very low space complexity

    Off the Beaten Path: Let's Replace Term-Based Retrieval with k-NN Search

    Full text link
    Retrieval pipelines commonly rely on a term-based search to obtain candidate records, which are subsequently re-ranked. Some candidates are missed by this approach, e.g., due to a vocabulary mismatch. We address this issue by replacing the term-based search with a generic k-NN retrieval algorithm, where a similarity function can take into account subtle term associations. While an exact brute-force k-NN search using this similarity function is slow, we demonstrate that an approximate algorithm can be nearly two orders of magnitude faster at the expense of only a small loss in accuracy. A retrieval pipeline using an approximate k-NN search can be more effective and efficient than the term-based pipeline. This opens up new possibilities for designing effective retrieval pipelines. Our software (including data-generating code) and derivative data based on the Stack Overflow collection is available online

    Hybrid LSH: Faster Near Neighbors Reporting in High-dimensional Space

    Get PDF
    We study the rr-near neighbors reporting problem (rr-NN), i.e., reporting \emph{all} points in a high-dimensional point set SS that lie within a radius rr of a given query point qq. Our approach builds upon on the locality-sensitive hashing (LSH) framework due to its appealing asymptotic sublinear query time for near neighbor search problems in high-dimensional space. A bottleneck of the traditional LSH scheme for solving rr-NN is that its performance is sensitive to data and query-dependent parameters. On datasets whose data distributions have diverse local density patterns, LSH with inappropriate tuning parameters can sometimes be outperformed by a simple linear search. In this paper, we introduce a hybrid search strategy between LSH-based search and linear search for rr-NN in high-dimensional space. By integrating an auxiliary data structure into LSH hash tables, we can efficiently estimate the computational cost of LSH-based search for a given query regardless of the data distribution. This means that we are able to choose the appropriate search strategy between LSH-based search and linear search to achieve better performance. Moreover, the integrated data structure is time efficient and fits well with many recent state-of-the-art LSH-based approaches. Our experiments on real-world datasets show that the hybrid search approach outperforms (or is comparable to) both LSH-based search and linear search for a wide range of search radii and data distributions in high-dimensional space.Comment: Accepted as a short paper in EDBT 201

    Optimized Neural Networks to Search for Higgs Boson Production at the Tevatron

    Get PDF
    An optimal choice of proper kinematical variables is one of the main steps in using neural networks (NN) in high energy physics. Our method of the variable selection is based on the analysis of a structure of Feynman diagrams (singularities and spin correlations) contributing to the signal and background processes. An application of this method to the Higgs boson search at the Tevatron leads to an improvement in the NN efficiency by a factor of 1.5-2 in comparison to previous NN studies.Comment: 4 pages, 4 figures, partially presented in proceedings of ACAT'02 conferenc

    Analysis of the low-energy η\etaNN-dynamics within a three-body formalism

    Get PDF
    The interaction of an η\eta-meson with two nucleons is studied within a three-body approach. The major features of the ηNN\eta NN-system in the low-energy region are accounted for by using a s-wave separable ansatz for the two-body ηN\eta N- and NNNN-amplitudes. The calculation is confined to the (Jπ;T)=(0;1)(J^\pi;T)=(0^-;1) and (1;0)(1^-;0) configurations which are assumed to be the most promising candidates for virtual or resonant ηNN\eta NN-states. The eigenvalue three-body equation is continued analytically into the nonphysical sheets by contour deformation. The position of the poles of the three-body scattering matrix as a function of the ηN\eta N-interaction strength is investigated. The corresponding trajectory, starting on the physical sheet, moves around the ηNN\eta NN three-body threshold and continues away from the physical area giving rise to virtual ηNN\eta NN-states. The search for poles on the nonphysical sheets adjacent directly to the upper rim of the real energy axis gives a negative result. Thus no low-lying s-wave ηNN\eta NN-resonances were found. The possible influence of virtual poles on the low-energy ηNN\eta NN-scattering is discussed.Comment: 16 pages revtex including 10 figure
    corecore