12,039 research outputs found

    Scalable Similarity Search for Molecular Descriptors

    Full text link
    Similarity search over chemical compound databases is a fundamental task in the discovery and design of novel drug-like molecules. Such databases often encode molecules as non-negative integer vectors, called molecular descriptors, which represent rich information on various molecular properties. While there exist efficient indexing structures for searching databases of binary vectors, solutions for more general integer vectors are in their infancy. In this paper we present a time- and space- efficient index for the problem that we call the succinct intervals-splitting tree algorithm for molecular descriptors (SITAd). Our approach extends efficient methods for binary-vector databases, and uses ideas from succinct data structures. Our experiments, on a large database of over 40 million compounds, show SITAd significantly outperforms alternative approaches in practice.Comment: To be appeared in the Proceedings of SISAP'1

    Efficient Large-scale Approximate Nearest Neighbor Search on the GPU

    Full text link
    We present a new approach for efficient approximate nearest neighbor (ANN) search in high dimensional spaces, extending the idea of Product Quantization. We propose a two-level product and vector quantization tree that reduces the number of vector comparisons required during tree traversal. Our approach also includes a novel highly parallelizable re-ranking method for candidate vectors by efficiently reusing already computed intermediate values. Due to its small memory footprint during traversal, the method lends itself to an efficient, parallel GPU implementation. This Product Quantization Tree (PQT) approach significantly outperforms recent state of the art methods for high dimensional nearest neighbor queries on standard reference datasets. Ours is the first work that demonstrates GPU performance superior to CPU performance on high dimensional, large scale ANN problems in time-critical real-world applications, like loop-closing in videos

    Experiments on the Efficiency of Cluster Searches

    Get PDF
    The efficiency of various cluster based retrieval (CBR) strategies is analyzed. The possibility of combining CBR and inverted index search (11s) is investigated. A method for combining the two approaches is proposed and shown to be cost effective in terms of paging and CPU time. The observations prove that the new method is much more efficient than conventional approaches. In the experiments, the effect of the number of selected clusters, centroid length, page size, and matching function is considered. The experiments show that the storage overhead of the new method would be moderately higher than that of IIS. The paper also examines the question: Is it beneficial to combine CBR and full search in terms of effectiveness

    The NASA Astrophysics Data System: Architecture

    Full text link
    The powerful discovery capabilities available in the ADS bibliographic services are possible thanks to the design of a flexible search and retrieval system based on a relational database model. Bibliographic records are stored as a corpus of structured documents containing fielded data and metadata, while discipline-specific knowledge is segregated in a set of files independent of the bibliographic data itself. The creation and management of links to both internal and external resources associated with each bibliography in the database is made possible by representing them as a set of document properties and their attributes. To improve global access to the ADS data holdings, a number of mirror sites have been created by cloning the database contents and software on a variety of hardware and software platforms. The procedures used to create and manage the database and its mirrors have been written as a set of scripts that can be run in either an interactive or unsupervised fashion. The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table

    CUORE and beyond: bolometric techniques to explore inverted neutrino mass hierarchy

    Get PDF
    The CUORE (Cryogenic Underground Observatory for Rare Events) experiment will search for neutrinoless double beta decay of 130^{130}Te. With 741 kg of TeO2_2 crystals and an excellent energy resolution of 5 keV (0.2%) at the region of interest, CUORE will be one of the most competitive neutrinoless double beta decay experiments on the horizon. With five years of live time, CUORE projected neutrinoless double beta decay half-life sensitivity is 1.6×10261.6\times 10^{26} y at 1σ1\sigma (9.5×10259.5\times10^{25} y at the 90% confidence level), which corresponds to an upper limit on the effective Majorana mass in the range 40--100 meV (50--130 meV). Further background rejection with auxiliary light detector can significantly improve the search sensitivity and competitiveness of bolometric detectors to fully explore the inverted neutrino mass hierarchy with 130^{130}Te and possibly other double beta decay candidate nuclei.Comment: Submitted to the Proceedings of TAUP 2013 Conferenc
    • …
    corecore