Reading PDB: Perception
of Molecules from 3D Atomic
Coordinates
- Publication date
- Publisher
Abstract
The analysis of small molecule crystal structures is
a common way
to gather valuable information for drug development. The necessary
structural data is usually provided in specific file formats containing
only element identities and three-dimensional atomic coordinates as
reliable chemical information. Consequently, the automated perception
of molecular structures from atomic coordinates has become a standard
task in cheminformatics. The molecules generated by such methods must
be both chemically valid and reasonable to provide a reliable basis
for subsequent calculations. This can be a difficult task since the
provided coordinates may deviate from ideal molecular geometries due
to experimental uncertainties or low resolution. Additionally, the
quality of the input data often differs significantly thus making
it difficult to distinguish between actual structural features and
mere geometric distortions. We present a method for the generation
of molecular structures from atomic coordinates based on the recently
published NAOMI model. By making use of this consistent chemical description,
our method is able to generate reliable results even with input data
of low quality. Molecules from 363 Protein Data Bank (PDB) entries
could be perceived with a success rate of 98%, a result which could
not be achieved with previously described methods. The robustness
of our approach has been assessed by processing all small molecules
from the PDB and comparing them to reference structures. The complete
data set can be processed in less than 3 min, thus showing that our
approach is suitable for large scale applications