Identification of diverse database subsets using property-based and fragment-based molecular descriptions

Ashton, M.; Barnard, J.; Casset, F.; Charlton, M.; Downs, G.; Gorse, D.; Holliday, J.D.; Lahana, R.; Willett, P.

research

oai:eprints.whiterose.ac.uk:3570

Identification of diverse database subsets using property-based and fragment-based molecular descriptions

Authors: M. Ashton
J. Barnard
F. Casset
M. Charlton
G. Downs
D. Gorse
J.D. Holliday
R. Lahana
P. Willett
Publication date: 1 January 2003
Publisher: Wiley
Doi

Abstract

This paper reports a comparison of calculated molecular properties and of 2D fragment bit-strings when used for the selection of structurally diverse subsets of a file of 44295 compounds. MaxMin dissimilarity-based selection and k-means cluster-based selection are used to select subsets containing between 1% and 20% of the file. Investigation of the numbers of bioactive molecules in the selected subsets suggest: that the MaxMin subsets are noticeably superior to the k-means subsets; that the property-based descriptors are marginally superior to the fragment-based descriptors; and that both approaches are noticeably superior to random selection

Similar works

Full text

Open in the Core reader

Download PDF

White Rose Research Online

oai:eprints.whiterose.ac.uk:35...

Last time updated on 28/06/2012

This paper was published in White Rose Research Online.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.