Search CORE

6 research outputs found

A tree-based method for the rapid screening of chemical fingerprints

Author: Kristensen Thomas G
Nielsen Jesper
Pedersen Christian NS
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background The fingerprint of a molecule is a bitstring based on its structure, constructed such that structurally similar molecules will have similar fingerprints. Molecular fingerprints can be used in an initial phase of drug development for identifying novel drug candidates by screening large databases for molecules with fingerprints similar to a query fingerprint. Results In this paper, we present a method which efficiently finds all fingerprints in a database with Tanimoto coefficient to the query fingerprint above a user defined threshold. The method is based on two novel data structures for rapid screening of large databases: the <it>k</it>D grid and the Multibit tree. The <it>k</it>D grid is based on splitting the fingerprints into <it>k </it>shorter bitstrings and utilising these to compute bounds on the similarity of the complete bitstrings. The Multibit tree uses hierarchical clustering and similarity within each cluster to compute similar bounds. We have implemented our method and tested it on a large real-world data set. Our experiments show that our method yields approximately a three-fold speed-up over previous methods. Conclusions Using the novel <it>k</it>D grid and Multibit tree significantly reduce the time needed for searching databases of fingerprints. This will allow researchers to (1) perform more searches than previously possible and (2) to easily search large databases.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Speeding Up Chemical Database Searches Using a Proximity Filter Based on the Logical Exclusive OR

Author: Baldi P.
Bender A.
Bloom B. H.
Chen J.
Chen J.
Daniel S. Hirschberg
Fligner M. A.
Flower D. R.
Hassan M.
Hert J.
Holliday J. D.
Pierre Baldi
Ramzi J. Nasr
Rouvray D. H.
Swamidass S.
Swamidass S.
Tversky A.
Xue L.
Xue L.
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref

Software for supporting large scale data processing for High Throughput Screening

Author: Tai David
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2011
Field of study

High Throughput Screening for is a valuable data generation technique for data driven knowledge discovery. Because the rate of data generation is so great, it is a challenge to cope with the demands of post experiment data analysis. This thesis presents three software solutions that I implemented in an attempt to alleviate this problem. The first is K-Screen, a Laboratory Information Management System designed to handle and visualize large High Throughput Screening datasets. K-Screen is being successfully used by the University of Kansas High Throughput Screening Laboratory to better organize and visualize their data. The next two algorithms are designed to accelerate the search times for chemical similarity searches using 1-dimensional fingerprints. The first algorithm balances information content in bit strings to attempt to find more optimal ordering and segmentation patterns for chemical fingerprints. The second algorithm eliminates redundant pruning calculations for large batch chemical similarity searches and shows a 250% improvement for the fastest current fingerprint search algorithm for large batch queries

KU ScholarWorks

Algorithms for Constructing Exact Nearest Neighbor Graphs

Author: Anastasiu David
Publication venue
Publication date: 01/06/2016
Field of study

University of Minnesota Ph.D. dissertation.June 2016. Major: Computer Science. Advisor: George Karypis. 1 computer file (PDF); xi, 151 pages.Nearest neighbor graphs (NNGs) contain the set of closest neighbors, and their similarities, for each of the objects in a set of objects. They are widely used in many real-world applications, such as clustering, online advertising, recommender systems, data cleaning, and query refinement. A brute-force method for constructing the graph requires O(n^2) similarity comparisons for a set of n objects. One way to reduce the number of comparisons is to ignore object pairs with low similarity, which are unimportant in many domains. Current methods for construction of the graph tackle the problem by either pruning the similarity search space, avoiding comparisons of objects that can be determined to not meet the similarity bounding conditions, or they solve the problem approximately, which can miss some of the neighbors. This thesis addresses the problem of efficiently constructing the exact nearest neighbor graph for a large set of objects, i.e., the graph that would be found by comparing each object against all other objects in the set. In this context, we address two specific problems. The epsilon-nearest neighbor graph (epsilon-NNG) construction problem, also known as all-pairs similarity search (APSS), seeks to find, for each object, all other objects with a similarity of at least some threshold epsilon. On the other hand, the k-nearest neighbor graph (k-NNG) construction problem seeks to find the k closest other objects to each object in the set. For both problems, we propose filtering techniques that are more effective than previous ones, and efficient serial and parallel algorithms to construct the graph. Our methods are ideally suited for sparse high dimensional data

University of Minnesota Digital Conservancy