12,039 research outputs found
Scalable Similarity Search for Molecular Descriptors
Similarity search over chemical compound databases is a fundamental task in
the discovery and design of novel drug-like molecules. Such databases often
encode molecules as non-negative integer vectors, called molecular descriptors,
which represent rich information on various molecular properties. While there
exist efficient indexing structures for searching databases of binary vectors,
solutions for more general integer vectors are in their infancy. In this paper
we present a time- and space- efficient index for the problem that we call the
succinct intervals-splitting tree algorithm for molecular descriptors (SITAd).
Our approach extends efficient methods for binary-vector databases, and uses
ideas from succinct data structures. Our experiments, on a large database of
over 40 million compounds, show SITAd significantly outperforms alternative
approaches in practice.Comment: To be appeared in the Proceedings of SISAP'1
Efficient Large-scale Approximate Nearest Neighbor Search on the GPU
We present a new approach for efficient approximate nearest neighbor (ANN)
search in high dimensional spaces, extending the idea of Product Quantization.
We propose a two-level product and vector quantization tree that reduces the
number of vector comparisons required during tree traversal. Our approach also
includes a novel highly parallelizable re-ranking method for candidate vectors
by efficiently reusing already computed intermediate values. Due to its small
memory footprint during traversal, the method lends itself to an efficient,
parallel GPU implementation. This Product Quantization Tree (PQT) approach
significantly outperforms recent state of the art methods for high dimensional
nearest neighbor queries on standard reference datasets. Ours is the first work
that demonstrates GPU performance superior to CPU performance on high
dimensional, large scale ANN problems in time-critical real-world applications,
like loop-closing in videos
Experiments on the Efficiency of Cluster Searches
The efficiency of various cluster based retrieval (CBR) strategies is analyzed. The possibility of combining CBR and inverted index search (11s) is investigated. A method for combining the two approaches is proposed and shown to be cost effective in terms of paging and CPU time. The observations prove that the new method is much more efficient than conventional approaches. In the experiments, the effect of the number of selected clusters, centroid length, page size, and
matching function is considered. The experiments show that the storage overhead of the new method would be moderately higher than that of IIS. The paper also examines the question: Is it
beneficial to combine CBR and full search in terms of effectiveness
The NASA Astrophysics Data System: Architecture
The powerful discovery capabilities available in the ADS bibliographic
services are possible thanks to the design of a flexible search and retrieval
system based on a relational database model. Bibliographic records are stored
as a corpus of structured documents containing fielded data and metadata, while
discipline-specific knowledge is segregated in a set of files independent of
the bibliographic data itself.
The creation and management of links to both internal and external resources
associated with each bibliography in the database is made possible by
representing them as a set of document properties and their attributes.
To improve global access to the ADS data holdings, a number of mirror sites
have been created by cloning the database contents and software on a variety of
hardware and software platforms.
The procedures used to create and manage the database and its mirrors have
been written as a set of scripts that can be run in either an interactive or
unsupervised fashion.
The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table
CUORE and beyond: bolometric techniques to explore inverted neutrino mass hierarchy
The CUORE (Cryogenic Underground Observatory for Rare Events) experiment will
search for neutrinoless double beta decay of Te. With 741 kg of TeO
crystals and an excellent energy resolution of 5 keV (0.2%) at the region of
interest, CUORE will be one of the most competitive neutrinoless double beta
decay experiments on the horizon. With five years of live time, CUORE projected
neutrinoless double beta decay half-life sensitivity is y
at ( y at the 90% confidence level), which
corresponds to an upper limit on the effective Majorana mass in the range
40--100 meV (50--130 meV). Further background rejection with auxiliary light
detector can significantly improve the search sensitivity and competitiveness
of bolometric detectors to fully explore the inverted neutrino mass hierarchy
with Te and possibly other double beta decay candidate nuclei.Comment: Submitted to the Proceedings of TAUP 2013 Conferenc
- …