Search CORE

28 research outputs found

New Strategy for Receptor-Based Pharmacophore Query Construction: A Case Study for 5‑HT7 Receptor Ligands

Author: Andrzej J. Bojarski (499184)
Rafał Kurczab (1312479)
Publication venue
Publication date
Field of study

In this paper, a new approach for generating receptor-based 3D pharmacophore models for rapid in silico virtual screening is presented. The method combines information from docking poses of known ligands of different structures and further ligand–receptor complexes analyses using structural interaction fingerprints (SIFts). Next, the best linear combination of three-, four-, and five-feature pharmacophores in terms of selected performance parameter (i.e., recall, F-score, and MCC) is constructed. The resultant queries showed significantly better VS performance and new scaffold recognition when compared with the known ligand- and receptor-based pharmacophore models. The approach was developed and validated on 5-HT7 receptor homology models created on available crystal structure templates. The efficiency of the obtained linear combinations exhibited only a minor dependence on the template selection

The Francis Crick Institute

The influence of the negative-positive ratio and screening database size on the performance of machine learning-based virtual screening

Author: Andrzej J. Bojarski (499184)
Rafał Kurczab (1312479)
Publication venue
Publication date: 01/01/2017
Field of study

<div>The machine learning-based virtual screening of molecular databases is a commonly used approach to identify hits. However, many aspects associated with training predictive models can influence the final performance and, consequently, the number of hits found. Thus, we performed a systematic study of the simultaneous influence of the proportion of negatives to positives in the testing set, the size of screening databases and the type of molecular representations on the effectiveness of classification. The results obtained for eight protein targets, five machine learning algorithms (SMO, Naïve Bayes, Ibk, J48 and Random Forest), two types of molecular fingerprints (MACCS and CDK FP) and eight screening databases with different numbers of molecules confirmed our previous findings that increases in the ratio of negative to positive training instances greatly influenced most of the investigated parameters of the ML methods in simulated virtual screening experiments. However, the performance of screening was shown to also be highly dependent on the molecular library dimension. Generally, with the increasing size of the screened database, the optimal training ratio also increased, and this ratio can be rationalized using the proposed cost-effectiveness threshold approach. To increase the performance of machine learning-based virtual screening, the training set should be constructed in a way that considers the size of the screening database.</div

Directory of Open Access Journals

PubMed Central

The Francis Crick Institute

Example illustration of CScore model generation using component models.

Author: Andrzej J. Bojarski (499184)
Paweł Zajdel (2606617)
Rafał Kurczab (1312479)
Vittorio Canale (2606626)
Publication venue
Publication date
Field of study

The ROC curves were used to show the performance of the component and CScore models and the thresholds (red circles) used to determine the classification (A—for MACCS FP and a training ratio of 0.40; B—for SIFt-p generated using the 5-HT1BR template with loops and a training ratio of 0.40).</p

The Francis Crick Institute

Heat maps comparing the performance of CScore models obtained for all the studied cases, i.e., representation of the data (three molecular fingerprints and SIFt-p for models with and without loops) and strategy for component models selection (AUC, MCC).

Author: Andrzej J. Bojarski (499184)
Paweł Zajdel (2606617)
Rafał Kurczab (1312479)
Vittorio Canale (2606626)
Publication venue
Publication date
Field of study

Heat maps comparing the performance of CScore models obtained for all the studied cases, i.e., representation of the data (three molecular fingerprints and SIFt-p for models with and without loops) and strategy for component models selection (AUC, MCC).</p

The Francis Crick Institute