research

Evaluation of the EVA Descriptor for QSAR Studies: 3. The use of a Genetic Algorithm to Search for Models with Enhanced Predictive Properties (EVA_GA)

Abstract

The EVA structural descriptor, based upon calculated fundamental molecular vibrational frequencies, has proved to be an effective descriptor for both QSAR and database similarity calculations. The descriptor is sensitive to 3D structure but has an advantage over field-based 3D-QSAR methods inasmuch as structural superposition is not required. The original technique involves a standardisation method wherein uniform Gaussians of fixed standard deviation (σ) are used to smear out frequencies projected onto a linear scale. This smearing function permits the overlap of proximal frequencies and thence the extraction of a fixed dimensional descriptor regardless of the number and precise values of the frequencies. It is proposed here that there exist optimal localised values of σ in different spectral regions; that is, the overlap of frequencies using uniform Gaussians may, at certain points in the spectrum, either be insufficient to pick up relationships where they exist or mix up information to such an extent that significant correlations are obscured by noise. A genetic algorithm is used to search for optimal localised σ values using crossvalidated PLS regression scores as the fitness score to be optimised. The resultant models are then validated against a previously unseen test set of compounds. The performance of EVA_GA is compared to that of EVA and analogous CoMFA studies

    Similar works