4 research outputs found

    Tertiary structure-based prediction of conformational B-cell epitopes through B factors

    Full text link
    Motivation: B-cell epitope is a small area on the surface of an antigen that binds to an antibody. Accurately locating epitopes is of critical importance for vaccine development. Compared with wet-lab methods, computational methods have strong potential for efficient and large-scale epitope prediction for antigen candidates at much lower cost. However, it is still not clear which features are good determinants for accurate epitope prediction, leading to the unsatisfactory performance of existing prediction methods. Method and results: We propose a much more accurate B-cell epitope prediction method. Our method uses a new feature B factor (obtained from X-ray crystallography), combined with other basic physicochemical, statistical, evolutionary and structural features of each residue. These basic features are extended by a sequence window and a structure window. All these features are then learned by a two-stage random forest model to identify clusters of antigenic residues and to remove isolated outliers. Tested on a dataset of 55 epitopes from 45 tertiary structures, we prove that our method significantly outperforms all three existing structure-based epitope predictors. Following comprehensive analysis, it is found that features such as B factor, relative accessible surface area and protrusion index play an important role in characterizing B-cell epitopes. Our detailed case studies on an HIV antigen and an influenza antigen confirm that our second stage learning is effective for clustering true antigenic residues and for eliminating self-made prediction errors introduced by the first-stage learning. © 2014 The Author. Published by Oxford University Press. All rights reserved

    The Empirical Comparison of Machine Learning Algorithm for the Class Imbalanced Problem in Conformational Epitope Prediction

    Get PDF
    A conformational epitope is a part of a protein-based vaccine. It is challenging to identify using an experiment. A computational model is developed to support identification. However, the imbalance class is one of the constraints to achieving optimal performance on the conformational epitope B cell prediction. In this paper, we compare several conformational epitope B cell prediction models from non-ensemble and ensemble approaches. A sampling method from Random undersampling, SMOTE, and cluster-based undersampling is combined with a decision tree or SVM to build a non-ensemble model. A random forest model and several variants of the bagging method is used to construct the ensemble model. A 10-fold cross-validation method is used to validate the model.  The experiment results show that the combination of the cluster-based under-sampling and decision tree outperformed the other sampling method when combined with the non-ensemble and the ensemble method. This study provides a baseline to improve existing models for dealing with the class imbalance in the conformational epitope prediction

    Differences in antigenic sites and other functional regions between genotype A and G mumps virus surface proteins

    Get PDF
    The surface proteins of the mumps virus, the fusion protein (F) and haemagglutinin-neuraminidase (HN), are key factors in mumps pathogenesis and are important targets for the immune response during mumps virus infection. We compared the predicted amino acid sequences of the F and HN genes from Dutch mumps virus samples from the pre-vaccine era (1957–1982) with mumps virus genotype G strains (from 2004 onwards). Genotype G is the most frequently detected mumps genotype in recent outbreaks in vaccinated communities, especially in Western Europe, the USA and Japan. Amino acid differences between the Jeryl Lynn vaccine strains (genotype A) and genotype G strains were predominantly located in known B-cell epitopes and in N-linked glycosylation sites on the HN protein. There were eight variable amino acid positions specific to genotype A or genotype G sequences in five known B-cell epitopes of the HN protein. These differences may account for the reported antigenic differences between Jeryl Lynn and genotype G strains. We also found amino acid differences in and near sites on the HN protein that have been reported to play a role in mumps virus pathogenesis. These differences may contribute to the occurrence of genotype G outbreaks in vaccinated communities