Molecular Descriptor Subset Selection in Theoretical
Peptide Quantitative Structure–Retention Relationship Model
Development Using Nature-Inspired
Optimization Algorithms
- Publication date
- Publisher
Abstract
In
this work, performance of five nature-inspired optimization
algorithms, genetic algorithm (GA), particle swarm optimization (PSO),
artificial bee colony (ABC), firefly algorithm (FA), and flower pollination
algorithm (FPA), was compared in molecular descriptor selection for
development of quantitative structure–retention relationship
(QSRR) models for 83 peptides that originate from eight model proteins.
The matrix with 423 descriptors was used as input, and QSRR models
based on selected descriptors were built using partial least squares
(PLS), whereas root mean square error of prediction (RMSEP) was used
as a fitness function for their selection. Three performance criteria,
prediction accuracy, computational cost, and the number of selected
descriptors, were used to evaluate the developed QSRR models. The
results show that all five variable selection methods outperform interval
PLS (iPLS), sparse PLS (sPLS), and the full PLS model, whereas GA
is superior because of its lowest computational cost and higher accuracy
(RMSEP of 5.534%) with a smaller number of variables (nine descriptors).
The GA-QSRR model was validated initially through Y-randomization.
In addition, it was successfully validated with an external testing
set out of 102 peptides originating from <i>Bacillus subtilis</i> proteomes (RMSEP of 22.030%). Its applicability domain was defined,
from which it was evident that the developed GA-QSRR exhibited strong
robustness. All the sources of the model’s error were identified,
thus allowing for further application of the developed methodology
in proteomics