24 research outputs found
Comparison of Confirmed Inactive and Randomly Selected Compounds as Negative Training Examples in Support Vector Machine-Based Virtual Screening
The choice of negative training data
for machine learning is a
little explored issue in chemoinformatics. In this study, the influence
of alternative sets of negative training data and different background
databases on support vector machine (SVM) modeling and virtual screening
has been investigated. Target-directed SVM models have been derived
on the basis of differently composed training sets containing confirmed
inactive molecules or randomly selected database compounds as negative
training instances. These models were then applied to search background
databases consisting of biological screening data or randomly assembled
compounds for available hits. Negative training data were found to
systematically influence compound recall in virtual screening. In
addition, different background databases had a strong influence on
the search results. Our findings also indicated that typical benchmark
settings lead to an overestimation of SVM-based virtual screening
performance compared to search conditions that are more relevant for
practical applications
Introduction of Target Cliffs as a Concept To Identify and Describe Complex Molecular Selectivity Patterns
The study of target specificity or selectivity of small
molecules
is an important task in drug design. In an ideal situation, a compound
would exclusively interact with an individual target and hence be
target specific. However, such exclusive binding events are likely
to be rare, as increasing evidence suggests. Because many compounds
are active against more than one target, apparent selectivity often
results from potency differences, i.e., a compound that is highly
potent against a given target and weakly potent against one or more
others displays target selectivity. In a simple case, a compound might
have known activity against a pair of targets and be selective for
one over the other. Then, selectivity is straightforward to rationalize.
However, there are many more complex selectivity relationships associated
with multi-target activities of compounds that are difficult to analyze
and compare
in a consistent manner. For this purpose, we introduce herein target
cliffs as a concept to describe complex selectivity patterns. A target
cliff is defined as a pair of targets against which at least one compound
displays a large difference in potency. As such, target cliffs are
distinct from activity cliffs. However, qualifying target pairs (target
cliffs) and compound pairs (activity cliffs) can be systematically
extracted from the same data structure termed target-compound matrices.
Furthermore, these two types of cliffs can be compared to identify
and prioritize compounds that are selective and reveal structure–activity
relationship (SAR) information
Similarity Searching for Potent Compounds Using Feature Selection
In
similarity searching, compound potency is usually not taken into account.
Given a set of active reference compounds, similarity to database
molecules is calculated using different metrics without considering
compound potency as a search parameter. Herein, we introduce a feature
selection method for fingerprint similarity searching to maximize
compound recall and preferentially detect potent compounds. On the
basis of training examples, fingerprint features are selected that
identify potent compounds and produce high recall. Using the reduced
fingerprint representations, potent hits are preferentially detected,
even if reference compounds have only moderate or low potency. Small
sets of simple chemical features are found to yield high search performance
Application of a New Scaffold Concept for Computational Target Deconvolution of Chemical Cancer Cell Line Screens
Target
deconvolution of phenotypic assays is a hot topic in chemical
biology and drug discovery. The ultimate goal is the identification
of targets for compounds that produce interesting phenotypic readouts.
A variety of experimental and computational strategies have been devised
to aid this process. A widely applied computational approach infers
putative targets of new active molecules on the basis of their chemical
similarity to compounds with activity against known targets. Herein,
we introduce a molecular scaffold-based variant for similarity-based
target deconvolution from chemical cancer cell line screens that were
used as a model system for phenotypic assays. A new scaffold type
was used for substructure-based similarity assessment, termed analog
series-based (ASB) scaffold. Compared with conventional scaffolds
and compound-based similarity calculations, target assignment centered
on ASB scaffolds resulting from screening hits and bioactive reference
compounds restricted the number of target hypotheses in a meaningful
way and lead to a significant enrichment of known cancer targets among
candidates
Composition and Topology of Activity Cliff Clusters Formed by Bioactive Compounds
The
assessment of activity cliffs has thus far mostly focused on
compound pairs, although the majority of activity cliffs are not formed
in isolation but in a coordinated manner involving multiple active
compounds and cliffs. However, the composition of coordinated activity
cliff configurations and their topologies are unknown. Therefore,
we have identified all activity cliff configurations formed by currently
available bioactive compounds and analyzed them in network representations
where activity cliff configurations occur as clusters. The composition,
topology, frequency of occurrence, and target distribution of activity
cliff clusters have been determined. A limited number of large cliff
clusters with unique topologies were identified that were centers
of activity cliff formation. These clusters originated from a small
number of target sets. However, most clusters were of small to moderate
size. Three basic topologies were sufficient to describe recurrent
activity cliff cluster motifs/topologies. For example, frequently
occurring clusters with star topology determined the scale-free character
of the global activity cliff network and represented a characteristic
activity cliff configuration. Large clusters with complex topology
were often found to contain different combinations of basic topologies.
Our study provides a first view of activity cliff configurations formed
by currently available bioactive compounds and of the recurrent topologies
of activity cliff clusters. Activity cliff clusters of defined topology
can be selected, and from compounds forming the clusters, SAR information
can be obtained. The SAR information of activity cliff clusters sharing
a/one specific activity and topology can be compared
Current Compound Coverage of the Kinome
Publicly available kinase inhibitors
have been analyzed in detail. Nearly
19000 inhibitors have been identified with activity against 266 different
kinases. Thus, about half of the human kinome is currently covered
with active small molecules. The distribution of inhibitors across
the kinome is uneven. Most available kinase inhibitors are likely
to be type I inhibitors. By contrast, type II inhibitors are rare
but usually have high potency. Kinase inhibitors generally display
high scaffold diversity. Activity cliffs with an at least 100-fold
difference in potency are only found for inhibitors of 106 kinases,
which is partly due to only small numbers of compounds available for
many kinases, in addition to scaffold diversity. Moreover, kinase
inhibitors are less promiscuous than often thought. More than 70%
of available inhibitors are only annotated with a single kinase activity,
and only ∼1% of the inhibitors are active against five or more
kinases
Matched Molecular Pair Analysis of Small Molecule Microarray Data Identifies Promiscuity Cliffs and Reveals Molecular Origins of Extreme Compound Promiscuity
The study of compound promiscuity is a hot topic in medicinal
chemistry
and drug discovery research. Promiscuous compounds are increasingly
identified, but the molecular basis of promiscuity is currently only
little understood. Utilizing the matched molecular pair formalism,
we have analyzed patterns of compound promiscuity in a publicly available
small molecule microarray data set. On the basis of our analysis,
we introduce “promiscuity cliffs” as pairs of structural
analogs with single-site substitutions that lead to large-magnitude
differences in apparent compound promiscuity involving between 50
and 97 unrelated targets. No substructures or substructure transformations
have been detected that are generally responsible for introducing
promiscuity. However, within a given structural context, small chemical
replacements were found to lead to dramatic promiscuity effects. On
the basis of our analysis, promiscuity is not an inherent feature
of molecular scaffolds but can be induced by small chemical substitutions.
Promiscuity cliffs provide immediate access to such modifications
Prediction of Individual Compounds Forming Activity Cliffs Using Emerging Chemical Patterns
Activity
cliffs are formed by structurally similar or analogous
compounds having large potency differences. In medicinal chemistry,
pairs or groups of compounds forming activity cliffs are of interest
for structure–activity relationship (SAR) analysis and compound
optimization. Thus far,
activity cliff assessment has mostly been descriptive, i.e., compound
data sets and activity landscape representations have been searched
for activity cliffs in the context of SAR analysis. Only recently,
first attempts have also been made to depart from descriptive analysis
and predict activity cliffs. This has been done by building computational
models that distinguish compound pairs forming activity cliffs from
non-cliff
pairs. However, it is principally more challenging to predict single
compounds that participate in activity cliffs. Here, we show that
individual compounds having high or low potency can be accurately
predicted to form activity cliffs on the basis of emerging chemical
patterns
Classification of Compounds with Distinct or Overlapping Multi-Target Activities and Diverse Molecular Mechanisms Using Emerging Chemical Patterns
The
emerging chemical patterns (ECP) approach has been introduced
for compound classification. Thus far, only very few ECP applications
have been reported. Here, we further investigate the ECP methodology
by studying complex classification problems. The analysis involves
multi-target data sets with systematically organized subsets of compounds
having distinct or overlapping target activities and, in addition,
data sets containing classes of specifically active compounds with
different mechanism-of-action. In systematic classification trials
focusing on individual compound subsets or mechanistic classes, ECP
calculations utilizing numerical descriptors achieve moderate to high
sensitivity, dependent on the data set, and consistently high specificity.
Accurate ECP predictions are already obtained on the basis of very
small learning sets with only three positive training instances, which
distinguishes the ECP approach from many other machine learning techniques
Compound Pathway Model To Capture SAR Progression: Comparison of Activity Cliff-Dependent and -Independent Pathways
A compound pathway model is introduced
to monitor SAR progression
in compound data sets. Pathways are formed by sequences of structurally
analogous compounds with stepwise increasing potency that ultimately
yield highly potent compounds. Hence, the model was designed to mimic
compound optimization efforts. Different pathway categories were defined.
Pathways originating from any active compound in a data set were systematically
identified including compounds forming activity cliffs. The relative
frequency of activity cliff-dependent and -independent pathways was
determined and compared. In 23 of 39 different compound data sets
that qualified for our analysis, significant differences in the relative
frequency of activity cliff-dependent and -independent pathways were
observed. In 17 of these 23 data sets, activity cliff-dependent pathways
occurred with higher relative frequency than cliff-independent pathways.
In addition, pathways originating from the majority of activity cliff
compounds displayed desired SAR progression, reflecting SAR information
gain associated with activity cliffs