474 research outputs found

    Prediction of peptides binding to MHC class I and II alleles by temporal motif mining

    Get PDF
    Background: MHC (Major Histocompatibility Complex) is a key player in the immune response of most vertebrates. The computational prediction of whether a given antigenic peptide will bind to a specific MHC allele is important in the development of vaccines for emerging pathogens, the creation of possibilities for controlling immune response, and for the applications of immunotherapy. One of the problems that make this computational prediction difficult is the detection of the binding core region in peptides, coupled with the presence of bulges and loops causing variations in the total sequence length. Most machine learning methods require the sequences to be of the same length to successfully discover the binding motifs, ignoring the length variance in both motif mining and prediction steps. In order to overcome this limitation, we propose the use of time-based motif mining methods that work position-independently. Results: The prediction method was tested on a benchmark set of 28 different alleles for MHC class I and 27 different alleles for MHC class II. The obtained results are comparable to the state of the art methods for both MHC classes, surpassing the published results for some alleles. The average prediction AUC values are 0.897 for class I, and 0.858 for class II. Conclusions: Temporal motif mining using partial periodic patterns can capture information about the sequences well enough to predict the binding of the peptides and is comparable to state of the art methods in the literature. Unlike neural networks or matrix based predictors, our proposed method does not depend on peptide length and can work with both short and long fragments. This advantage allows better use of the available training data and the prediction of peptides of uncommon lengths

    Prediction of peptides binding to MHC class I and II alleles by temporal motif mining

    Get PDF
    Background: MHC (Major Histocompatibility Complex) is a key player in the immune response of most vertebrates. The computational prediction of whether a given antigenic peptide will bind to a specific MHC allele is important in the development of vaccines for emerging pathogens, the creation of possibilities for controlling immune response, and for the applications of immunotherapy. One of the problems that make this computational prediction difficult is the detection of the binding core region in peptides, coupled with the presence of bulges and loops causing variations in the total sequence length. Most machine learning methods require the sequences to be of the same length to successfully discover the binding motifs, ignoring the length variance in both motif mining and prediction steps. In order to overcome this limitation, we propose the use of time-based motif mining methods that work position-independently. Results: The prediction method was tested on a benchmark set of 28 different alleles for MHC class I and 27 different alleles for MHC class II. The obtained results are comparable to the state of the art methods for both MHC classes, surpassing the published results for some alleles. The average prediction AUC values are 0.897 for class I, and 0.858 for class II. Conclusions: Temporal motif mining using partial periodic patterns can capture information about the sequences well enough to predict the binding of the peptides and is comparable to state of the art methods in the literature. Unlike neural networks or matrix based predictors, our proposed method does not depend on peptide length and can work with both short and long fragments. This advantage allows better use of the available training data and the prediction of peptides of uncommon lengths

    Prediction of peptides binding to MHC class I alleles by partial periodic pattern mining

    Get PDF
    MHC (Major Histocompatibility Complex) is a key player in the immune response of an organism. It is important to be able to predict which antigenic peptides will bind to a specific MHC allele and which will not, creating possibilities for controlling immune response and for the applications of immunotherapy. However, a problem for MHC class I is the presence of bulges and loops in the peptides, changing the total length. Most machine learning methods in use today require the sequences to be of same length to successfully mine the binding motifs. We propose the use of time-based data mining methods in motif mining to be able to mine motifs position-independently. Also, the information for both binding and non-binding peptides is used on the contrary to the other methods which only rely on binding peptides. The prediction results are between 60-95% for the tested alleles

    Human antibody immune responses are personalized by selective removal of MHC-II peptide epitopes [preprint]

    Get PDF
    Human antibody responses are established by the generation of combinatorial sequence diversity in antibody variable domains, followed by iterative rounds of mutation and selection via T cell recognition of antigen peptides presented on MHC-II. Here, we report that MHC-II peptide epitope deletion from B cell receptors (BCRs) correlates with antibody development in vivo. Large-scale antibody sequence analysis and experimental validation of peptide binding revealed that MHC-II epitope removal from BCRs is linked to genetic signatures of T cell help, and donor-specific antibody repertoire modeling demonstrated that somatic hypermutation selectively targets the personalized MHC-II epitopes in antibody variable regions. Mining of class-switched sequences and serum proteomic data revealed that MHC-II epitope deletion is associated with antibody class switching and long-term secretion into serum. These data suggest that the MHC-II peptide epitope content of a BCR is an important determinant of antibody maturation that shapes the composition and durability of humoral immunity

    Emerging Vaccine Informatics

    Get PDF
    Vaccine informatics is an emerging research area that focuses on development and applications of bioinformatics methods that can be used to facilitate every aspect of the preclinical, clinical, and postlicensure vaccine enterprises. Many immunoinformatics algorithms and resources have been developed to predict T- and B-cell immune epitopes for epitope vaccine development and protective immunity analysis. Vaccine protein candidates are predictable in silico from genome sequences using reverse vaccinology. Systematic transcriptomics and proteomics gene expression analyses facilitate rational vaccine design and identification of gene responses that are correlates of protection in vivo. Mathematical simulations have been used to model host-pathogen interactions and improve vaccine production and vaccination protocols. Computational methods have also been used for development of immunization registries or immunization information systems, assessment of vaccine safety and efficacy, and immunization modeling. Computational literature mining and databases effectively process, mine, and store large amounts of vaccine literature and data. Vaccine Ontology (VO) has been initiated to integrate various vaccine data and support automated reasoning

    Analysis of cytotoxic T cell epitopes in relation to cancer

    Get PDF

    Concept and application of a computational vaccinology workflow

    Get PDF
    BACKGROUND : The last years have seen a renaissance of the vaccine area, driven by clinical needs in infectious diseases but also chronic diseases such as cancer and autoimmune disorders. Equally important are technological improvements involving nano-scale delivery platforms as well as third generation adjuvants. In parallel immunoinformatics routines have reached essential maturity for supporting central aspects in vaccinology going beyond prediction of antigenic determinants. On this basis computational vaccinology has emerged as a discipline aimed at ab-initio rational vaccine design.Here we present a computational workflow for implementing computational vaccinology covering aspects from vaccine target identification to functional characterization and epitope selection supported by a Systems Biology assessment of central aspects in host-pathogen interaction. We exemplify the procedures for Epstein Barr Virus (EBV), a clinically relevant pathogen causing chronic infection and suspected of triggering malignancies and autoimmune disorders. RESULTS : We introduce pBone/pView as a computational workflow supporting design and execution of immunoinformatics workflow modules, additionally involving aspects of results visualization, knowledge sharing and re-use. Specific elements of the workflow involve identification of vaccine targets in the realm of a Systems Biology assessment of host-pathogen interaction for identifying functionally relevant targets, as well as various methodologies for delineating B- and T-cell epitopes with particular emphasis on broad coverage of viral isolates as well as MHC alleles.Applying the workflow on EBV specifically proposes sequences from the viral proteins LMP2, EBNA2 and BALF4 as vaccine targets holding specific B- and T-cell epitopes promising broad strain and allele coverage. CONCLUSION : Based on advancements in the experimental assessment of genomes, transcriptomes and proteomes for both, pathogen and (human) host, the fundaments for rational design of vaccines have been laid out. In parallel, immunoinformatics modules have been designed and successfully applied for supporting specific aspects in vaccine design. Joining these advancements, further complemented by novel vaccine formulation and delivery aspects, have paved the way for implementing computational vaccinology for rational vaccine design tackling presently unmet vaccine challenges

    Discovering discriminative and class-specific sequence and structural motifs in proteins

    Get PDF
    Finding recurring motifs is an important problem in bioinformatics. Such motifs can be used for any number of problems including sequence classi cation, label prediction, knowledge discovery and biological engineering of proteins t for a speci c purpose. Our motivation is to create a better foundation for the research and development of novel motif mining and machine learning methods that can extract class-speci c and discriminative motifs using both sequence and structural features. We propose the building blocks of a general machine learning framework to act on a biological input. This thesis present a combination of elements that are aimed to be applicable to a variety of biological problems. Ideally, the learner should only require a number of biological data instances as input that are classi- ed into a number of di erent classes as de ned by the researchers. The output should be the factors and motifs that discriminate between those classes (for reasonable, non-random class de nitions). This ideal work ow requires two main steps. First step is the representation of the biological input with features that contain the signi cant information the researcher is looking for. Due to the complexity of the macromolecules, abstract representations are required to convert the real world representation into quanti able descriptors that are suitable for motif mining and machine learning. The second step of the proposed work ow is the motif mining and knowledge discovery step. Using these informative representations, an algorithm should be able to nd discriminative, class-speci c motifs that are over-represented in one class and under-represented in the other. This thesis presents novel procedures for representation of the proteins to be used in a variety of machine learning algorithms, and two separate motif mining algorithms, one based on temporal motif mining, and the other on deep learning, that can work with the given biological data. The descriptors and the learners are applied to a wide range of computational problems encountered in life sciences

    Sequence analysis methods for the design of cancer vaccines that target tumor-specific mutant antigens (neoantigens)

    Get PDF
    The human adaptive immune system is programmed to distinguish between self and non-self proteins and if trained to recognize markers unique to a cancer, it may be possible to stimulate the selective destruction of cancer cells. Therapeutic cancer vaccines aim to boost the immune system by selectively increasing the population of T cells specifically targeted to the tumor-unique antigens, thereby initiating cancer cell death.. In the past, this approach has primarily focused on targeted selection of ‘shared’ tumor antigens, found across many patients. The advent of massively parallel sequencing and specialized analytical approaches has enabled more efficient characterization of tumor-specific mutant antigens, or neoantigens. Specifically, methods to predict which tumor-specific mutant peptides (neoantigens) can elicit anti-tumor T cell recognition improve predictions of immune checkpoint therapy response and identify one or more neoantigens as targets for personalized vaccines. Selecting the best/most immunogenic neoantigens from a large number of mutations is an important challenge, in particular in cancers with a high mutational load, such as melanomas and smoker-associated lung cancers. To address such a challenging task, Chapter 1 of this thesis describes a genome-guided in silico approach to identifying tumor neoantigens that integrates tumor mutation and expression data (DNA- and RNA-Seq). The cancer vaccine design process, from read alignment to variant calling and neoantigen prediction, typically assumes that the genotype of the Human Reference Genome sequence surrounding each somatic variant is representative of the patient’s genome sequence, and does not account for the effect of nearby variants (somatic or germline) in the neoantigenic peptide sequence. Because the accuracy of neoantigen identification has important implications for many clinical trials and studies of basic cancer immunology, Chapter 2 describes and supports the need for patient-specific inclusion of proximal variants to address this previously oversimplified assumption in the identification of neoantigens. The method of neoantigen identification described in Chapter 1 was subsequently extended (Chapter 3) and improved by the addition of a modular workflow that aids in each component of the neoantigen prediction process from neoantigen identification, prioritization, data visualization, and DNA vaccine design. These chapters describe massively parallel sequence analysis methods that will help in the identification and subsequent refinement of patient-specific antigens for use in personalized immunotherapy

    Structure and Evolution of Lizard Immunity Genes

    Get PDF
    One of the most important gene families to play a role in adaptive immunity is the major histocompatibility complex (MHC). MHC class II loci are considered to be the most variable loci in the vertebrate genome, and studies have shown that this variability can be maintained through complex co-evolutionary dynamics between host and parasite. Despite the rich body of research into the MHC, there is comparatively little understanding of its genomic architecture in reptiles. Similarly, loci associated with innate immunity have received little attention in reptiles compared to other vertebrates. In the first chapter, we investigated the structure and organization of the MHC in the Anolis carolinensis genome by sequencing and annotating five bacterial artificial chromosomes (BAC) from the green anole genome library. We were able to identify three mhc2a, four mhc2b, and up to 15 mhc1 loci in A. carolinensis. Furthermore, we were able to link 17 scaffolds and provide sequence data to fill two significant gaps in the genome assembly. In the second chapter, we investigated the relative importance of drift and selection in shaping mhc2 variability in the reptile Podarcis erhardii. We sequenced the mhc2 gene from lizard populations from 14 islands in the Aegean that have experienced bottlenecks of differing duration and intensity. Despite signals of balancing selection, patterns of mhc2 variation were similar to microsatellites, providing evidence that the dominant evolutionary force in this system is drift. In the third chapter, we investigated how parasite infection rates impact innate immune variability in A. sabanus, a lizard indigenous to Saba Island where natural fluctuations in Plasmodium infection rates have been documented. We developed primers and sequenced part of the peptide binding region of three Toll-like receptors (TLRs) - tlr4, tlr6, and tlr13 and several beta-defensin (BD) loci. Although we were unable to characterize BD variability, we found three different haplotypes in tlr4, and five in tlr6. However, nucleotide variability was low (π \u3c 0.005) and was not associated with infection status. We nevertheless present primers for multiple TLR genes and two BDs that could be of use in future studies of reptile innate immunity
    corecore