2 research outputs found
Benchmarking Study of Parameter Variation When Using Signature Fingerprints Together with Support Vector Machines
QSAR modeling using
molecular signatures and support vector machines
with a radial basis function is increasingly used for virtual screening
in the drug discovery field. This method has three free parameters: <i>C</i>, γ, and signature height. <i>C</i> is
a penalty parameter that limits overfitting, γ controls the
width of the radial basis function kernel, and the signature height
determines how much of the molecule is described by each atom signature.
Determination of optimal values for these parameters is time-consuming.
Good default values could therefore save considerable computational
cost. The goal of this project was to investigate whether such default
values could be found by using seven public QSAR data sets spanning
a wide range of end points and using both a bit version and a count
version of the molecular signatures. On the basis of the experiments
performed, we recommend a parameter set of heights 0 to 2 for the
count version of the signature fingerprints and heights 0 to 3 for
the bit version. These are in combination with a support vector machine
using <i>C</i> in the range of 1 to 100 and γ in the
range of 0.001 to 0.1. When data sets are small or longer run times
are not a problem, then there is reason to consider the addition of
height 3 to the count fingerprint and a wider grid search. However,
marked improvements should not be expected
Ligand-Based Target Prediction with Signature Fingerprints
When evaluating a potential drug
candidate it is desirable to predict
target interactions in silico prior to synthesis in order to assess,
e.g., secondary pharmacology. This can be done by looking at known
target binding profiles of similar compounds using chemical similarity
searching. The purpose of this study was to construct and evaluate
the performance of chemical fingerprints based on the molecular signature
descriptor for performing target binding predictions. For the comparison
we used the area under the receiver operating characteristics curve
(AUC) complemented with net reclassification improvement (NRI). We
created two open source signature fingerprints, a bit and a count
version, and evaluated their performance compared to a set of established
fingerprints with regards to predictions of binding targets using
Tanimoto-based similarity searching on publicly available data sets
extracted from ChEMBL. The results showed that the count version of
the signature fingerprint performed on par with well-established fingerprints
such as ECFP. The count version outperformed the bit version slightly;
however, the count version is more complex and takes more computing
time and memory to run so its usage should probably be evaluated on
a case-by-case basis. The NRI based tests complemented the AUC based
ones and showed signs of higher power