Noncontiguous
Atom Matching Structural Similarity
Function
- Publication date
- Publisher
Abstract
Measuring similarity between molecules
is a fundamental problem
in cheminformatics. Given that similar molecules tend to have similar
physical, chemical, and biological properties, the notion of molecular
similarity plays an important role in the exploration of molecular
data sets, query-retrieval in molecular databases, and in structure–property/activity
modeling. Various methods to define structural similarity between
molecules are available in the literature, but so far none has been
used with consistent and reliable results for all situations. We propose
a new similarity method based on atom alignment for the analysis of
structural similarity between molecules. This method is based on the
comparison of the bonding profiles of atoms on comparable molecules,
including features that are seldom found in other structural or graph
matching approaches like chirality or double bond stereoisomerism.
The similarity measure is then defined on the annotated molecular
graph, based on an iterative directed graph similarity procedure and
optimal atom alignment between atoms using a pairwise matching algorithm.
With the proposed approach the similarities detected are more intuitively
understood because similar atoms in the molecules are explicitly shown.
This noncontiguous atom matching structural similarity method (NAMS)
was tested and compared with one of the most widely used similarity
methods (fingerprint-based similarity) using three difficult data
sets with different characteristics. Despite having a higher computational
cost, the method performed well being able to distinguish either different
or very similar hydrocarbons that were indistinguishable using a fingerprint-based
approach. NAMS also verified the similarity principle using a data
set of structurally similar steroids with differences in the binding
affinity to the corticosteroid binding globulin receptor by showing
that pairs of steroids with a high degree of similarity (>80%)
tend
to have smaller differences in the absolute value of binding activity.
Using a highly diverse set of compounds with information about the
monoamine oxidase inhibition level, the method was also able to recover
a significantly higher average fraction of active compounds when the
seed is active for different cutoff threshold values of similarity.
Particularly, for the cutoff threshold values of 86%, 93%, and 96.5%,
NAMS was able to recover a fraction of actives of 0.57, 0.63, and
0.83, respectively, while the fingerprint-based approach was able
to recover a fraction of actives of 0.41, 0.40, and 0.39, respectively.
NAMS is made available freely for the whole community in a simple
Web based tool as well as the Python source code at http://nams.lasige.di.fc.ul.pt/