Search CORE

jCompoundMapper: An open source Java library and command-line tool for chemical fingerprints

Author: Fechner Nikolas
Hinselmann Georg
Jahn Andreas
Rosenbaum Lars
Zell Andreas
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The decomposition of a chemical graph is a convenient approach to encode information of the corresponding organic compound. While several commercial toolkits exist to encode molecules as so-called fingerprints, only a few open source implementations are available. The aim of this work is to introduce a library for exactly defined molecular decompositions, with a strong focus on the application of these features in machine learning and data mining. It provides several options such as search depth, distance cut-offs, atom- and pharmacophore typing. Furthermore, it provides the functionality to combine, to compare, or to export the fingerprints into several formats. Results We provide a Java 1.6 library for the decomposition of chemical graphs based on the open source Chemistry Development Kit toolkit. We reimplemented popular fingerprinting algorithms such as depth-first search fingerprints, extended connectivity fingerprints, autocorrelation fingerprints (e.g. CATS2D), radial fingerprints (e.g. Molprint2D), geometrical Molprint, atom pairs, and pharmacophore fingerprints. We also implemented custom fingerprints such as the all-shortest path fingerprint that only includes the subset of shortest paths from the full set of paths of the depth-first search fingerprint. As an application of jCompoundMapper, we provide a command-line executable binary. We measured the conversion speed and number of features for each encoding and described the composition of the features in detail. The quality of the encodings was tested using the default parametrizations in combination with a support vector machine on the Sutherland QSAR data sets. Additionally, we benchmarked the fingerprint encodings on the large-scale Ames toxicity benchmark using a large-scale linear support vector machine. The results were promising and could often compete with literature results. On the large Ames benchmark, for example, we obtained an AUC ROC performance of 0.87 with a reimplementation of the extended connectivity fingerprint. This result is comparable to the performance achieved by a non-linear support vector machine using state-of-the-art descriptors. On the Sutherland QSAR data set, the best fingerprint encodings showed a comparable or better performance on 5 of the 8 benchmarks when compared against the results of the best descriptors published in the paper of Sutherland et al. Conclusions jCompoundMapper is a library for chemical graph fingerprints with several tweaking possibilities and exporting options for open source data mining toolkits. The quality of the data mining results, the conversion speed, the LPGL software license, the command-line interface, and the exporters should be useful for many applications in cheminformatics like benchmarks against literature methods, comparison of data mining algorithms, similarity searching, and similarity-based data mining.</p

Application of 3D Zernike descriptors to shape-based ligand similarity searching

Background: The identification of promising drug leads from a large database of compounds is an important step in the preliminary stages of drug design. Although shape is known to play a key role in the molecular recognition process, its application to virtual screening poses significant hurdles both in terms of the encoding scheme and speed. Results: In this study, we have examined the efficacy of the alignment independent three-dimensional Zernike descriptor (3DZD) for fast shape based similarity searching. Performance of this approach was compared with several other methods including the statistical moments based ultrafast shape recognition scheme (USR) and SIMCOMP, a graph matching algorithm that compares atom environments. Three benchmark datasets are used to thoroughly test the methods in terms of their ability for molecular classification, retrieval rate, and performance under the situation that simulates actual virtual screening tasks over a large pharmaceutical database. The 3DZD performed better than or comparable to the other methods examined, depending on the datasets and evaluation metrics used. Reasons for the success and the failure of the shape based methods for specific cases are investigated. Based on the results for the three datasets, general conclusions are drawn with regard to their efficiency and applicability

CiteSeerX

Crossref

Public Library of Science (PLOS)

Purdue E-Pubs

Systematic Exploitation of Multiple Receptor Conformations for Virtual Ligand Screening

Author: Abagyan Ruben
Bottegoni Giovanni
Cavalli Andrea
Rocchia Walter
Rueda Manuel
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The role of virtual ligand screening in modern drug discovery is to mine large chemical collections and to prioritize for experimental testing a comparatively small and diverse set of compounds with expected activity against a target. Several studies have pointed out that the performance of virtual ligand screening can be improved by taking into account receptor flexibility. Here, we systematically assess how multiple crystallographic receptor conformations, a powerful way of discretely representing protein plasticity, can be exploited in screening protocols to separate binders from non-binders. Our analyses encompass 36 targets of pharmaceutical relevance and are based on actual molecules with reported activity against those targets. The results suggest that an ensemble receptor-based protocol displays a stronger discriminating power between active and inactive molecules as compared to its standard single rigid receptor counterpart. Moreover, such a protocol can be engineered not only to enrich a higher number of active compounds, but also to enhance their chemical diversity. Finally, some clear indications can be gathered on how to select a subset of receptor conformations that is most likely to provide the best performance in a real life scenario

Archivio istituzionale della ricerca - Università di Urbino

Crossref

eScholarship - University of California

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Estimation of the applicability domain of kernel-based machine learning models for virtual screening

Author: Fechner Nikolas
Hinselmann Georg
Jahn Andreas
Zell Andreas
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study