Characterization of the Chemical Space of Known and
Readily Obtainable Natural Products
- Publication date
- Publisher
Abstract
Natural products remain one of the
most productive sources of chemical
inspiration for the development of new drugs. The structures of more
than 250 000 natural products are available from public databases.
At least 10% of these compounds are readily obtainable for experimental
testing from commercial vendors and public research institutions.
While the physicochemical properties of known natural products have
been thoroughly studied and compared to those of drugs and other types
of small molecules, the information available on the content, coverage,
and relevance of individual virtual and physical natural product libraries
is clearly limited. The aim of this study was the development of a
detailed understanding of the coverage of chemical space by known
and readily obtainable natural products and by individual natural
product databases. For this purpose, we compiled comprehensive data
sets of known and readily obtainable natural products from 18 virtual
databases (including the Dictionary of Natural Products), nine physical
libraries, and the Protein Data Bank (PDB). We also developed and
employed an algorithm (“SugarBuster”) for the removal
of sugars and sugar-like moieties, which are generally not in the
focus of interest for drug discovery, from natural products. In addition,
we devised a rule-based approach for the automated classification
of natural products into natural product classes (alkaloids, steroids,
flavonoids, etc.). Among the most important results of this study
is the finding that the readily obtainable natural products are highly
diverse and populate regions of chemical space that are of high relevance
to drug discovery. In some cases, substantial differences in the coverage
of natural product classes and chemical space by the individual databases
are observed. More than 2000 natural products are identified for which
at least one X-ray crystal structure of the compound in complex with
a biomacromolecule is available from the PDB