11,520 research outputs found
Scalable Similarity Search for Molecular Descriptors
Similarity search over chemical compound databases is a fundamental task in
the discovery and design of novel drug-like molecules. Such databases often
encode molecules as non-negative integer vectors, called molecular descriptors,
which represent rich information on various molecular properties. While there
exist efficient indexing structures for searching databases of binary vectors,
solutions for more general integer vectors are in their infancy. In this paper
we present a time- and space- efficient index for the problem that we call the
succinct intervals-splitting tree algorithm for molecular descriptors (SITAd).
Our approach extends efficient methods for binary-vector databases, and uses
ideas from succinct data structures. Our experiments, on a large database of
over 40 million compounds, show SITAd significantly outperforms alternative
approaches in practice.Comment: To be appeared in the Proceedings of SISAP'1
Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases
This paper reports an evaluation of both graph-based and fingerprint-based measures of structural similarity, when used for virtual screening of sets of 2D molecules drawn from the MDDR and ID Alert databases. The graph-based measures employ a new maximum common edge subgraph isomorphism algorithm, called RASCAL, with several similarity coefficients described previously for quantifying the similarity between pairs of graphs. The effectiveness of these graph-based searches is compared with that resulting from similarity searches using BCI, Daylight and Unity 2D fingerprints. Our results suggest that graph-based approaches provide an effective complement to existing fingerprint-based approaches to virtual screening
Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures
This paper compares several published methods for clustering chemical structures, using both graph- and fingerprint-based similarity measures. The clusterings from each method were compared to determine the degree of cluster overlap. Each method was also evaluated on how well it grouped structures into clusters possessing a non-trivial substructural commonality. The methods which employ adjustable parameters were tested to determine the stability of each parameter for datasets of varying size and composition. Our experiments suggest that both graph- and fingerprint-based similarity measures can be used effectively for generating chemical clusterings; it is also suggested that the CAST and Yin–Chen methods, suggested recently for the clustering of gene expression patterns, may also prove effective for the clustering of 2D chemical structures
Fingerprint databases for theorems
We discuss the advantages of searchable, collaborative, language-independent
databases of mathematical results, indexed by "fingerprints" of small and
canonical data. Our motivating example is Neil Sloane's massively influential
On-Line Encyclopedia of Integer Sequences. We hope to encourage the greater
mathematical community to search for the appropriate fingerprints within each
discipline, and to compile fingerprint databases of results wherever possible.
The benefits of these databases are broad - advancing the state of knowledge,
enhancing experimental mathematics, enabling researchers to discover unexpected
connections between areas, and even improving the refereeing process for
journal publication.Comment: to appear in Notices of the AM
Systematic methods for the computation of the directional fields and singular points of fingerprints
The first subject of the paper is the estimation of a high resolution directional field of fingerprints. Traditional methods are discussed and a method, based on principal component analysis, is proposed. The method not only computes the direction in any pixel location, but its coherence as well. It is proven that this method provides exactly the same results as the "averaged square-gradient method" that is known from literature. Undoubtedly, the existence of a completely different equivalent solution increases the insight into the problem's nature. The second subject of the paper is singular point detection. A very efficient algorithm is proposed that extracts singular points from the high-resolution directional field. The algorithm is based on the Poincare index and provides a consistent binary decision that is not based on postprocessing steps like applying a threshold on a continuous resemblance measure for singular points. Furthermore, a method is presented to estimate the orientation of the extracted singular points. The accuracy of the methods is illustrated by experiments on a live-scanned fingerprint databas
- …