18 research outputs found
Bioturbo Similarity Searching: Combining Chemical and Biological Similarity To Discover Structurally Diverse Bioactive Molecules
Virtual screening using bioactivity
profiles has become an integral
part of currently applied hit finding methods in pharmaceutical industry.
However, a significant drawback of this approach is that it is only
applicable to compounds that have been biologically tested in the
past and have sufficient activity annotations for meaningful profile
comparisons. Although bioactivity data generated in pharmaceutical
institutions are growing on an unprecedented scale, the number of
biologically annotated compounds still covers only a minuscule fraction
of chemical space. For a newly synthesized compound or an isolated
natural product to be biologically characterized across multiple assays,
it may take a considerable amount of time. Consequently, this chemical
matter will not be included in virtual screening campaigns based on
bioactivity profiles. To overcome this problem, we herein introduce
bioturbo similarity searching that uses chemical similarity to map
molecules without biological annotations into bioactivity space and
then searches for biologically similar compounds in this reference
system. In benchmark calculations on primary screening data, we demonstrate
that our approach generally achieves higher hit rates and identifies
structurally more diverse compounds than approaches using chemical
information only. Furthermore, our method is able to discover hits
with novel modes of inhibition that traditional 2D and 3D similarity
approaches are unlikely to discover. Test calculations on a set of
natural products reveal the practical utility of the approach for
identifying novel and synthetically more accessible chemical matter
Experimental Design Strategy: Weak Reinforcement Leads to Increased Hit Rates and Enhanced Chemical Diversity
High
Throughput Screening (HTS) is a common approach in life sciences
to discover chemical matter that modulates a biological target or
phenotype. However, low assay throughput, reagents cost, or a flowchart
that can deal with only a limited number of hits may impair screening
large numbers of compounds. In this case, a subset of compounds is
assayed, and <i>in silico</i> models are utilized to aid
in iterative screening design, usually to expand around the found
hits and enrich subsequent rounds for relevant chemical matter. However,
this may lead to an overly narrow focus, and the diversity of compounds
sampled in subsequent iterations may suffer. Active learning has been
recently successfully applied in drug discovery with the goal of sampling
diverse chemical space to improve model performance. Here we introduce
a robust and straightforward iterative screening protocol based on
naıĢve Bayes models. Instead of following up on the compounds
with the highest scores in the <i>in silico</i> model, we
pursue compounds with very low but positive values. This includes
unique chemotypes of weakly active compounds that enhance the applicability
domain of the model and increase the cumulative hit rates. We show
in a retrospective application to 81 Novartis assays that this protocol
leads to consistently higher compound and scaffold hit rates compared
to a standard expansion around hits or an active learning approach.
We recommend using the weak reinforcement strategy introduced herein
for iterative screening workflows
Experimental Design Strategy: Weak Reinforcement Leads to Increased Hit Rates and Enhanced Chemical Diversity
High
Throughput Screening (HTS) is a common approach in life sciences
to discover chemical matter that modulates a biological target or
phenotype. However, low assay throughput, reagents cost, or a flowchart
that can deal with only a limited number of hits may impair screening
large numbers of compounds. In this case, a subset of compounds is
assayed, and <i>in silico</i> models are utilized to aid
in iterative screening design, usually to expand around the found
hits and enrich subsequent rounds for relevant chemical matter. However,
this may lead to an overly narrow focus, and the diversity of compounds
sampled in subsequent iterations may suffer. Active learning has been
recently successfully applied in drug discovery with the goal of sampling
diverse chemical space to improve model performance. Here we introduce
a robust and straightforward iterative screening protocol based on
naıĢve Bayes models. Instead of following up on the compounds
with the highest scores in the <i>in silico</i> model, we
pursue compounds with very low but positive values. This includes
unique chemotypes of weakly active compounds that enhance the applicability
domain of the model and increase the cumulative hit rates. We show
in a retrospective application to 81 Novartis assays that this protocol
leads to consistently higher compound and scaffold hit rates compared
to a standard expansion around hits or an active learning approach.
We recommend using the weak reinforcement strategy introduced herein
for iterative screening workflows
Public Domain HTS Fingerprints: Design and Evaluation of Compound Bioactivity Profiles from PubChemās Bioassay Repository
Molecular profiling efforts aim at
characterizing the biological
actions of small molecules by screening them in hundreds of different
biochemical and/or cell-based assays. Together, these assays yield
a rich data landscape of target-based and phenotypic effects of the
tested compounds. However, submitting an entire compound library to
a molecular profiling panel can easily become cost-prohibitive. Here,
we make use of historical screening assays to create comprehensive
bioactivity profiles for more than 300āÆ000 small molecules.
These bioactivity profiles, termed <i>PubChem high-throughput
screening fingerprints</i> (PubChem HTSFPs), report small molecule
activities in 243 different PubChem bioassays. Although the assays
originate from originally independently pursued drug or probe discovery
projects, we demonstrate their value as molecular signatures when
used in combination. We use these PubChem HTSFPs as molecular descriptors
in hit expansion experiments for 33 different targets and phenotypes,
showing that, on average, they lead to 27 times as many hits in a
set of 1000 chosen molecules as a random screening subset of the same
size (average ROC score: 0.82). Moreover, we demonstrate that PubChem
HTSFPs retrieve hits that are structurally diverse and distinct from
active compounds retrieved by chemical similarity-based hit expansion
methods. PubChem HTSFPs are made freely available for the chemical
biology research community
Inside the Mind of a Medicinal Chemist: The Role of Human Bias in Compound Prioritization during Drug Discovery
<div><p>Medicinal chemistsā āintuitionā is critical for success in modern drug discovery. Early in the discovery process, chemists select a subset of compounds for further research, often from many viable candidates. These decisions determine the success of a discovery campaign, and ultimately what kind of drugs are developed and marketed to the public. Surprisingly little is known about the cognitive aspects of chemistsā decision-making when they prioritize compounds. We investigate 1) how and to what extent chemists simplify the problem of identifying promising compounds, 2) whether chemists agree with each other about the criteria used for such decisions, and 3) how accurately chemists report the criteria they use for these decisions. Chemists were surveyed and asked to select chemical fragments that they would be willing to develop into a lead compound from a set of ā¼4,000 available fragments. Based on each chemistās selections, computational classifiers were built to model each chemistās selection strategy. Results suggest that chemists greatly simplified the problem, typically using only 1ā2 of many possible parameters when making their selections. Although chemists tended to use the same parameters to select compounds, differing value preferences for these parameters led to an overall lack of consensus in compound selections. Moreover, what little agreement there was among the chemists was largely in what fragments were <em>undesirable</em>. Furthermore, chemists were often unaware of the parameters (such as compound size) which were statistically significant in their selections, and overestimated the number of parameters they employed. A critical evaluation of the problem space faced by medicinal chemists and cognitive models of categorization were especially useful in understanding the low consensus between chemists.</p> </div
Ring topology SNB classifier comparison between chemists.
<p>The most favorable and unfavorable keys for the RingBonds_AromaticBonds_RingAssemblies (RB_AB_RA ) descriptor model, which measures the number of ring bonds (RB), aromatic bonds (AB), and ring assemblies (RA) present in a compound, were examined. Representative scaffolds that correspond to these keys are depicted, and are clustered based on how chemists viewed them. The Bayes score for each models built on individual chemists for each key is reported in a heat map. The favorable keys receive a positive score, while unfavorable keys receive a negative score.</p
The SNB classifier built using a descriptor subsumed by the functional group parameter is illustrated for chemist 1.
<p>Keys that represent the presence (black) or absence (white) of chemical substructures are ordered from negative (bad) on the left to positive (good) values on the right (<b>A</b>). The worst and best substructure keys are zoomed in on (<b>B</b>). Specific chemical substructures (tertiary amine ā blue, aromatic heteroatom ā violet, hydroxyl ā aqua, and carboxylic acid - orange) are highlighted for one of the worst keys and two of the best keys, and illustrative examples of fragments that would be described by these keys are depicted (<b>C</b>).</p
Predictive accuracy of Semi-NaĆÆve Bayesian (SNB) and Random Forest (RF) classifiers trained on medicinal chemistsā selections.
<p>The average ROCS score for a 4-fold cross validation of each classifier is reported. <b>A</b>: SNB classifier built with medicinal chemistry relevant descriptors (red) is compared to a benchmark NaĆÆve-Bayesian classifier that uses extended connectivity fingerprints and physical chemical properties as descriptors (black). <b>B</b>: RF classifier built with medicinal chemistry relevant descriptors (blue) is compared to a benchmark RF classifier that uses extended connectivity fingerprints and physical chemical properties as descriptors (black).</p
The parameters extracted from the SNB (red) and RF (blue) classifiers are compared with parameters designated as important in chemistsā self-reports (grey).
<p>The primary parameters for the classifiers are depicted as stars, and the secondary parameters are depicted as circles. The one-tailed Fisher exact probability test (<i>p</i>) is reported for each parameter (except chains and charge), indicating that the SNB and RF parameters show agreement with each other, while the self reported parameters are independent of either of the classifierās parameters.</p
The selection characteristics of chemists with high estimated consensus.
<p>The cultural consensus model was applied to a subset of fragments (311) with >75% agreement by chemists. The estimated consensus obtained by this method is plotted against the fraction of fragments passed by chemists for the entire survey. Each shape describes the primary SNB parameter used to reproduce chemistsā selections, and the color depicts the ROC score of naĆÆve Bayesian classifiers built using ECFP4 as a descriptor for each chemist. A subset of high consensus chemists is above the dashed grey line.</p