3,853 research outputs found

    Enhancing in silico protein-based vaccine discovery for eukaryotic pathogens using predicted peptide-MHC binding and peptide conservation scores

    Get PDF
    © 2014 Goodswen et al. Given thousands of proteins constituting a eukaryotic pathogen, the principal objective for a high-throughput in silico vaccine discovery pipeline is to select those proteins worthy of laboratory validation. Accurate prediction of T-cell epitopes on protein antigens is one crucial piece of evidence that would aid in this selection. Prediction of peptides recognised by T-cell receptors have to date proved to be of insufficient accuracy. The in silico approach is consequently reliant on an indirect method, which involves the prediction of peptides binding to major histocompatibility complex (MHC) molecules. There is no guarantee nevertheless that predicted peptide-MHC complexes will be presented by antigen-presenting cells and/or recognised by cognate T-cell receptors. The aim of this study was to determine if predicted peptide-MHC binding scores could provide contributing evidence to establish a protein's potential as a vaccine. Using T-Cell MHC class I binding prediction tools provided by the Immune Epitope Database and Analysis Resource, peptide binding affinity to 76 common MHC I alleles were predicted for 160 Toxoplasma gondii proteins: 75 taken from published studies represented proteins known or expected to induce T-cell immune responses and 85 considered less likely vaccine candidates. The results show there is no universal set of rules that can be applied directly to binding scores to distinguish a vaccine from a non-vaccine candidate. We present, however, two proposed strategies exploiting binding scores that provide supporting evidence that a protein is likely to induce a T-cell immune response-one using random forest (a machine learning algorithm) with a 72% sensitivity and 82.4% specificity and the other, using amino acid conservation scores with a 74.6% sensitivity and 70.5% specificity when applied to the 160 benchmark proteins. More importantly, the binding score strategies are valuable evidence contributors to the overall in silico vaccine discovery pool of evidence

    A gene-based positive selection detection approach to identify vaccine candidates using Toxoplasma gondii as a test case protozoan pathogen

    Full text link
    © 2018 Goodswen, Kennedy and Ellis. Over the last two decades, various in silico approaches have been developed and refined that attempt to identify protein and/or peptide vaccines candidates from informative signals encoded in protein sequences of a target pathogen. As to date, no signal has been identified that clearly indicates a protein will effectively contribute to a protective immune response in a host. The premise for this study is that proteins under positive selection from the immune system are more likely suitable vaccine candidates than proteins exposed to other selection pressures. Furthermore, our expectation is that protein sequence regions encoding major histocompatibility complexes (MHC) binding peptides will contain consecutive positive selection sites. Using freely available data and bioinformatic tools, we present a high-throughput approach through a pipeline that predicts positive selection sites, protein subcellular locations, and sequence locations of medium to high T-Cell MHC class I binding peptides. Positive selection sites are estimated from a sequence alignment by comparing rates of synonymous (dS) and non-synonymous (dN) substitutions among protein coding sequences of orthologous genes in a phylogeny. The main pipeline output is a list of protein vaccine candidates predicted to be naturally exposed to the immune system and containing sites under positive selection. Candidates are ranked with respect to the number of consecutive sites located on protein sequence regions encoding MHCI-binding peptides. Results are constrained by the reliability of prediction programs and quality of input data. Protein sequences from Toxoplasma gondii ME49 strain (TGME49) were used as a case study. Surface antigen (SAG), dense granules (GRA), microneme (MIC), and rhoptry (ROP) proteins are considered worthy T. gondii candidates. Given 8263 TGME49 protein sequences processed anonymously, the top 10 predicted candidates were all worthy candidates. In particular, the top ten included ROP5 and ROP18, which are T. gondii virulence determinants. The chance of randomly selecting a ROP protein was 0.2% given 8263 sequences. We conclude that the approach described is a valuable addition to other in silico approaches to identify vaccines candidates worthy of laboratory validation and could be adapted for other apicomplexan parasite species (with appropriate data)

    Predicting Protein Therapeutic Candidates for Bovine Babesiosis Using Secondary Structure Properties and Machine Learning

    Get PDF
    Bovine babesiosis causes significant annual global economic loss in the beef and dairy cattle industry. It is a disease instigated from infection of red blood cells by haemoprotozoan parasites of the genus Babesia in the phylum Apicomplexa. Principal species are Babesia bovis, Babesia bigemina, and Babesia divergens. There is no subunit vaccine. Potential therapeutic targets against babesiosis include members of the exportome. This study investigates the novel use of protein secondary structure characteristics and machine learning algorithms to predict exportome membership probabilities. The premise of the approach is to detect characteristic differences that can help classify one protein type from another. Structural properties such as a protein's local conformational classification states, backbone torsion angles ϕ (phi) and ψ (psi), solvent-accessible surface area, contact number, and half-sphere exposure are explored here as potential distinguishing protein characteristics. The presented methods that exploit these structural properties via machine learning are shown to have the capacity to detect exportome from non-exportome Babesia bovis proteins with an 86-92% accuracy (based on 10-fold cross validation and independent testing). These methods are encapsulated in freely available Linux pipelines setup for automated, high-throughput processing. Furthermore, proposed therapeutic candidates for laboratory investigation are provided for B. bovis, B. bigemina, and two other haemoprotozoan species, Babesia canis, and Plasmodium falciparum.</i

    Compilation of parasitic immunogenic proteins from 30 years of published research using machine learning and natural language processing.

    Full text link
    The World Health Organisation reported in 2020 that six of the top 10 sources of death in low-income countries are parasites. Parasites are microorganisms in a relationship with a larger organism, the host. They acquire all benefits at the host's expense. A disease develops if the parasitic infection disrupts normal functioning of the host. This disruption can range from mild to severe, including death. Humans and livestock continue to be challenged by established and emerging infectious disease threats. Vaccination is the most efficient tool for preventing current and future threats. Immunogenic proteins sourced from the disease-causing parasite are worthwhile vaccine components (subunits) due to reliable safety and manufacturing capacity. Publications with 'subunit vaccine' in their title have accumulated to thousands over the last three decades. However, there are possibly thousands more reporting immunogenicity results without mentioning 'subunit' and/or 'vaccine'. The exact number is unclear given the non-standardised keywords in publications. The study aim is to identify parasite proteins that induce a protective response in an animal model as reported in the scientific literature within the last 30 years using machine learning and natural language processing. Source code to fulfil this aim and the vaccine candidate list obtained is made available

    A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms

    Get PDF
    Background: An in silico vaccine discovery pipeline for eukaryotic pathogens typically consists of several computational tools to predict protein characteristics. The aim of the in silico approach to discovering subunit vaccines is to use predicted characteristics to identify proteins which are worthy of laboratory investigation. A major challenge is that these predictions are inherent with hidden inaccuracies and contradictions. This study focuses on how to reduce the number of false candidates using machine learning algorithms rather than relying on expensive laboratory validation. Proteins from Toxoplasma gondii, Plasmodium sp., and Caenorhabditis elegans were used as training and test datasets.Results: The results show that machine learning algorithms can effectively distinguish expected true from expected false vaccine candidates (with an average sensitivity and specificity of 0.97 and 0.98 respectively), for proteins observed to induce immune responses experimentally.Conclusions: Vaccine candidates from an in silico approach can only be truly validated in a laboratory. Given any in silico output and appropriate training data, the number of false candidates allocated for validation can be dramatically reduced using a pool of machine learning algorithms. This will ultimately save time and money in the laboratory. © 2013 Goodswen et al.; licensee BioMed Central Ltd

    Computational Antigen Discovery for Eukaryotic Pathogens Using Vacceed.

    Full text link
    Bioinformatics programs have been developed that exploit informative signals encoded within protein sequences to predict protein characteristics. Unfortunately, there is no program as yet that can predict whether a protein will induce a protective immune response to a pathogen. Nonetheless, predicting those pathogen proteins most likely from those least likely to induce an immune response is feasible when collectively using predicted protein characteristics. Vacceed is a computational pipeline that manages different standalone bioinformatics programs to predict various protein characteristics, which offer supporting evidence on whether a protein is secreted or membrane -associated. A set of machine learning algorithms predicts the most likely pathogen proteins to induce an immune response given the supporting evidence. This chapter provides step by step descriptions of how to configure and operate Vacceed for a eukaryotic pathogen of the user's choice

    Extracting and explaining biological knowledge in microarray data

    Full text link
    © Springer-Verlag Berlin Heidelberg 2004. This paper describes a method of clustering lists of genes mined from a microarray dataset using functional information from the Gene Ontology. The method uses relationships between terms in the ontology both to build clusters and to extract meaningful cluster descriptions. The approach is general and may be applied to assist explanation of other datasets associated with ontologies

    The structure of latherin, a surfactant allergen protein from horse sweat and saliva

    Get PDF
    Latherin is a highly surface-active allergen protein found in the sweat and saliva of horses and other equids. Its surfactant activity is intrinsic to the protein in its native form, and is manifest without associated lipids or glycosylation. Latherin probably functions as a wetting agent in evaporative cooling in horses, but it may also assist in mastication of fibrous food as well as inhibition of microbial biofilms. It is a member of the PLUNC family of proteins abundant in the oral cavity and saliva of mammals, one of which has also been shown to be a surfactant and capable of disrupting microbial biofilms. How these proteins work as surfactants while remaining soluble and cell membrane-compatible is not known. Nor have their structures previously been reported. We have used protein nuclear magnetic resonance spectroscopy to determine the conformation and dynamics of latherin in aqueous solution. The protein is a monomer in solution with a slightly curved cylindrical structure exhibiting a ‘super-roll’ motif comprising a four-stranded anti-parallel β-sheet and two opposing α-helices which twist along the long axis of the cylinder. One end of the molecule has prominent, flexible loops that contain a number of apolar amino acid side chains. This, together with previous biophysical observations, leads us to a plausible mechanism for surfactant activity in which the molecule is first localized to the non-polar interface via these loops, and then unfolds and flattens to expose its hydrophobic interior to the air or non-polar surface. Intrinsically surface-active proteins are relatively rare in nature, and this is the first structure of such a protein from mammals to be reported. Both its conformation and proposed method of action are different from other, non-mammalian surfactant proteins investigated so far

    Kernel-based visualisation of genes with the gene ontology

    Full text link
    With the development of microarray-based high- throughput technologies for examining genetic and biological information en masse, biologists are now faced with making sense of large lists of genes identi-ffed from their biological experiments. There is a vital need for \system biology" approaches which can allow biologists to see new or unanticipated potential relationships which will lead to new hypotheses and eventual new knowledge. Finding and understanding relationships in this data is a problem well suited to visualisation. We augment genes with their associated terms from the Gene Ontology and visualise them using kernel Principal Component Analysis with both specialised linear and Gaussian kernels. Our results show that this method can correctly visualise genes by their functional relationships and we describe the difference between using the linear and Gaussian kernels on the problem. © 2008, Australian Computer Society, Inc
    • …
    corecore