We propose a specialized string kernel for small bio-molecules, peptides and
pseudo-sequences of binding interfaces. The kernel incorporates
physico-chemical properties of amino acids and elegantly generalize eight
kernels, such as the Oligo, the Weighted Degree, the Blended Spectrum, and the
Radial Basis Function. We provide a low complexity dynamic programming
algorithm for the exact computation of the kernel and a linear time algorithm
for it's approximation. Combined with kernel ridge regression and SupCK, a
novel binding pocket kernel, the proposed kernel yields biologically relevant
and good prediction accuracy on the PepX database. For the first time, a
machine learning predictor is capable of accurately predicting the binding
affinity of any peptide to any protein. The method was also applied to both
single-target and pan-specific Major Histocompatibility Complex class II
benchmark datasets and three Quantitative Structure Affinity Model benchmark
datasets.
On all benchmarks, our method significantly (p-value < 0.057) outperforms the
current state-of-the-art methods at predicting peptide-protein binding
affinities. The proposed approach is flexible and can be applied to predict any
quantitative biological activity. The method should be of value to a large
segment of the research community with the potential to accelerate
peptide-based drug and vaccine development.Comment: 22 pages, 4 figures, 5 table