11,520 research outputs found

    Scalable Similarity Search for Molecular Descriptors

    Full text link
    Similarity search over chemical compound databases is a fundamental task in the discovery and design of novel drug-like molecules. Such databases often encode molecules as non-negative integer vectors, called molecular descriptors, which represent rich information on various molecular properties. While there exist efficient indexing structures for searching databases of binary vectors, solutions for more general integer vectors are in their infancy. In this paper we present a time- and space- efficient index for the problem that we call the succinct intervals-splitting tree algorithm for molecular descriptors (SITAd). Our approach extends efficient methods for binary-vector databases, and uses ideas from succinct data structures. Our experiments, on a large database of over 40 million compounds, show SITAd significantly outperforms alternative approaches in practice.Comment: To be appeared in the Proceedings of SISAP'1

    Effectiveness of graph-based and fingerprint-based similarity measures for virtual screening of 2D chemical structure databases

    Get PDF
    This paper reports an evaluation of both graph-based and fingerprint-based measures of structural similarity, when used for virtual screening of sets of 2D molecules drawn from the MDDR and ID Alert databases. The graph-based measures employ a new maximum common edge subgraph isomorphism algorithm, called RASCAL, with several similarity coefficients described previously for quantifying the similarity between pairs of graphs. The effectiveness of these graph-based searches is compared with that resulting from similarity searches using BCI, Daylight and Unity 2D fingerprints. Our results suggest that graph-based approaches provide an effective complement to existing fingerprint-based approaches to virtual screening

    Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures

    Get PDF
    This paper compares several published methods for clustering chemical structures, using both graph- and fingerprint-based similarity measures. The clusterings from each method were compared to determine the degree of cluster overlap. Each method was also evaluated on how well it grouped structures into clusters possessing a non-trivial substructural commonality. The methods which employ adjustable parameters were tested to determine the stability of each parameter for datasets of varying size and composition. Our experiments suggest that both graph- and fingerprint-based similarity measures can be used effectively for generating chemical clusterings; it is also suggested that the CAST and Yin–Chen methods, suggested recently for the clustering of gene expression patterns, may also prove effective for the clustering of 2D chemical structures

    Fingerprint databases for theorems

    Full text link
    We discuss the advantages of searchable, collaborative, language-independent databases of mathematical results, indexed by "fingerprints" of small and canonical data. Our motivating example is Neil Sloane's massively influential On-Line Encyclopedia of Integer Sequences. We hope to encourage the greater mathematical community to search for the appropriate fingerprints within each discipline, and to compile fingerprint databases of results wherever possible. The benefits of these databases are broad - advancing the state of knowledge, enhancing experimental mathematics, enabling researchers to discover unexpected connections between areas, and even improving the refereeing process for journal publication.Comment: to appear in Notices of the AM

    Systematic methods for the computation of the directional fields and singular points of fingerprints

    Get PDF
    The first subject of the paper is the estimation of a high resolution directional field of fingerprints. Traditional methods are discussed and a method, based on principal component analysis, is proposed. The method not only computes the direction in any pixel location, but its coherence as well. It is proven that this method provides exactly the same results as the "averaged square-gradient method" that is known from literature. Undoubtedly, the existence of a completely different equivalent solution increases the insight into the problem's nature. The second subject of the paper is singular point detection. A very efficient algorithm is proposed that extracts singular points from the high-resolution directional field. The algorithm is based on the Poincare index and provides a consistent binary decision that is not based on postprocessing steps like applying a threshold on a continuous resemblance measure for singular points. Furthermore, a method is presented to estimate the orientation of the extracted singular points. The accuracy of the methods is illustrated by experiments on a live-scanned fingerprint databas
    corecore