743 research outputs found

    Ontology for Biomedical Investigations

    Get PDF

    Generating quantitative models describing the sequence specificity of biological processes with the stabilized matrix method

    Get PDF
    BACKGROUND: Many processes in molecular biology involve the recognition of short sequences of nucleic-or amino acids, such as the binding of immunogenic peptides to major histocompatibility complex (MHC) molecules. From experimental data, a model of the sequence specificity of these processes can be constructed, such as a sequence motif, a scoring matrix or an artificial neural network. The purpose of these models is two-fold. First, they can provide a summary of experimental results, allowing for a deeper understanding of the mechanisms involved in sequence recognition. Second, such models can be used to predict the experimental outcome for yet untested sequences. In the past we reported the development of a method to generate such models called the Stabilized Matrix Method (SMM). This method has been successfully applied to predicting peptide binding to MHC molecules, peptide transport by the transporter associated with antigen presentation (TAP) and proteasomal cleavage of protein sequences. RESULTS: Herein we report the implementation of the SMM algorithm as a publicly available software package. Specific features determining the type of problems the method is most appropriate for are discussed. Advantageous features of the package are: (1) the output generated is easy to interpret, (2) input and output are both quantitative, (3) specific computational strategies to handle experimental noise are built in, (4) the algorithm is designed to effectively handle bounded experimental data, (5) experimental data from randomized peptide libraries and conventional peptides can easily be combined, and (6) it is possible to incorporate pair interactions between positions of a sequence. CONCLUSION: Making the SMM method publicly available enables bioinformaticians and experimental biologists to easily access it, to compare its performance to other prediction methods, and to extend it to other applications

    Ontology for Biomedical Investigations

    Get PDF
    The goal of OBI is to enable a formal representation of biomedical investigations that captures the experimental evidence on which their findings are based. The scope of OBI includes: materials made in and produced for investigations, research objectives, experimental protocols, roles of people in investigations and processing and publication of data gathered in investigations. Use of OBI will allow comparison of experimental data from the wide array of scientific disciplines represented by domain experts in the OBI consortium. OBI follows the principles laid out by the OBO foundry, and integrates tightly with other foundry candidate ontologies, such as GO (www.geneontology.org) and ChEBI (www.ebi.ac.uk/chebi/) whose terms are used to describe biological reality. The use of OBI by the scientific community to represent or annotate their investigations within electronic data resources will facilitate interdisciplinary data synthesis, enable access to their data on the semantic web and improve third-party understanding of information related to life-science and clinical investigations.
&#xa

    FAIR principles and the IEDB: short-term improvements and a long-term vision of OBO-foundry mediated machine-actionable interoperability.

    Get PDF
    The Immune Epitope Database (IEDB), at www.iedb.org, has the mission to make published experimental data relating to the recognition of immune epitopes easily available to the scientific public. By presenting curated data in a searchable database, we have liberated it from the tables and figures of journal articles, making it more accessible and usable by immunologists. Recently, the principles of Findability, Accessibility, Interoperability and Reusability have been formulated as goals that data repositories should meet to enhance the usefulness of their data holdings. We here examine how the IEDB complies with these principles and identify broad areas of success, but also areas for improvement. We describe short-term improvements to the IEDB that are being implemented now, as well as a long-term vision of true 'machine-actionable interoperability', which we believe will require community agreement on standardization of knowledge representation that can be built on top of the shared use of ontologies

    Epitopes in ChEBI - A Collaboration with the IEDB

    Get PDF
    *ChEBI background:* Chemical Entities of Biological Interest (ChEBI) is a curated database of small chemical entities important in biosystems. As well as a description of entities, it provides a semantically rich knowledge base; and an internal hierarchy that organises the entities by their molecular structure types and potential rôles.

*The ChEBI-IEDB collaboration:* The Immune Epitope and Analysis Resource (IEDB) is a project supported by contract from the National Institute of Allergy and Infectious Diseases (NIAID). Its goal is to make epitope-related data on infectious diseases and immune disorders freely available to researchers worldwide. In June 2009, ChEBI began working with the IEDB on a project aimed at incorporating into ChEBI, by manual curation, a pilot subset of immunologically important chemicals identified as immune epitopes.

*The significance of the project:* Numerous reports attest to an increasing global prevalence of immune-related diseases, with a multiplicity of contributing factors. This situation underscores the need for cross-talk among the various scientific disciplines, and makes ChEBI involvement in this project particularly relevant. 

*Collaboration outcome:* That collaboration among curators working on different databases can be reciprocally beneficial has been amply demonstrated by the ChEBI-IEDB teamwork described: while the incorporated IEDB items have substantially enriched ChEBI, the latter’s multiplicity of synonyms, structure tree lay-out and expertise in describing non-peptidic epitopes have been equally useful to the IEDB in facilitating the search process.
*Status quo and plans:* We continue to refine our task of assisting the identification, understanding and utilisation of biologically meaningful chemical entities by engaging in further joint projects

    The Immune Epitope Database and Analysis Resource Program 2003–2018: reflections and outlook

    Get PDF
    The Immune Epitope Database and Analysis Resource (IEDB) contains information related to antibodies and T cells across an expansive scope of research fields (infectious diseases, allergy, autoimmunity, and transplantation). Capture and representation of the data to reflect growing scientific standards and techniques have required continual refinement of our rigorous curation and query and reporting processes beginning with the automated classification of over 28 million PubMed abstracts, and resulting in easily searchable data from over 20,000 published manuscripts. Data related to MHC binding and elution, nonpeptidics, natural processing, receptors, and 3D structure is first captured through manual curation and subsequently maintained through recuration to reflect evolving scientific standards. Upon promotion to the free, public database, users can query and export records of specific relevance via the online web portal which undergoes iterative development to best enable efficient data access. In parallel, the companion Analysis Resource site hosts a variety of tools that assist in the bioinformatic analyses of epitopes and related structures, which can be applied to IEDB-derived and independent datasets alike. Available tools are classified into two categories: analysis and prediction. Analysis tools include epitope clustering, sequence conservancy, and more, while prediction tools cover T and B cell epitope binding, immunogenicity, and TCR/BCR structures. In addition to these tools, benchmarking servers which allow for unbiased performance comparison are also offered. In order to expand and support the user-base of both the database and Analysis Resource, the research team actively engages in community outreach through publication of ongoing work, conference attendance and presentations, hosting of user workshops, and the provision of online help. This review provides a description of the IEDB database infrastructure, curation and recuration processes, query and reporting capabilities, the Analysis Resource, and our Community Outreach efforts, including assessment of the impact of the IEDB across the research community.Fil: Martini, Sheridan. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Nielsen, Morten. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; Argentina. Technical University of Denmark; DinamarcaFil: Peters, Bjoern. La Jolla Institute for Allergy and Immunology; Estados Unidos. University of California at San Diego; Estados UnidosFil: Sette, Alessandro. La Jolla Institute for Allergy and Immunology; Estados Unidos. University of California at San Diego; Estados Unido

    Determination of a predictive cleavage motif for eluted major histocompatibility complex class II ligands

    Get PDF
    CD4+ T cells have a major role in regulating immune responses. They are activated by recognition of peptides mostly generated from exogenous antigens through the major histocompatibility complex (MHC) class II pathway. Identification of epitopes is important and computational prediction of epitopes is used widely to save time and resources. Although there are algorithms to predict binding affinity of peptides to MHC II molecules, no accurate methods exist to predict which ligands are generated as a result of natural antigen processing. We utilized a dataset of around 14,000 naturally processed ligands identified by mass spectrometry of peptides eluted from MHC class II expressing cells to investigate the existence of sequence signatures potentially related to the cleavage mechanisms that liberate the presented peptides from their source antigens. This analysis revealed preferred amino acids surrounding both N- and C-terminuses of ligands, indicating sequence-specific cleavage preferences. We used these cleavage motifs to develop a method for predicting naturally processed MHC II ligands, and validated that it had predictive power to identify ligands from independent studies. We further confirmed that prediction of ligands based on cleavage motifs could be combined with predictions of MHC binding, and that the combined prediction had superior performance. However, when attempting to predict CD4+ T cell epitopes, either alone or in combination with MHC binding predictions, predictions based on the cleavage motifs did not show predictive power. Given that peptides identified as epitopes based on CD4+ T cell reactivity typically do not have well-defined termini, it is possible that motifs are present but outside of the mapped epitope. Our attempts to take that into account computationally did not show any sign of an increased presence of cleavage motifs around well-characterized CD4+ T cell epitopes. While it is possible that our attempts to translate the cleavage motifs in MHC II ligand elution data into T cell epitope predictions were suboptimal, other possible explanations are that the cleavage signal is too diluted to be detected, or that elution data are enriched for ligands generated through an antigen processing and presentation pathway that is less frequently utilized for T cell epitopes.Fil: Paul, Sinu. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Karosiene, Edita. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Dhanda, Sandeep Kumar. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Jurtz, Vanessa. Technical University of Denmark; DinamarcaFil: Edwards, Lindy. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Nielsen, Morten. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; Argentina. Technical University of Denmark; DinamarcaFil: Sette, Alessandro. University of California at San Diego; Estados Unidos. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Peters, Bjoern. La Jolla Institute for Allergy and Immunology; Estados Unidos. University of California at San Diego; Estados Unido

    NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data

    Get PDF
    Major histocompatibility complex (MHC) molecules are expressed on the cell surface, where they present peptides to T cells, which gives them a key role in the development of T-cell immune responses. MHC molecules come in two main variants: MHC Class I (MHC-I) and MHC Class II (MHC-II). MHC-I predominantly present peptides derived from intracellular proteins, whereas MHC-II predominantly presents peptides from extracellular proteins. In both cases, the binding between MHC and antigenic peptides is the most selective step in the antigen presentation pathway. Therefore, the prediction of peptide binding to MHC is a powerful utility to predict the possible specificity of a T-cell immune response. Commonly MHC binding prediction tools are trained on binding affinity or mass spectrometry-eluted ligands. Recent studies have however demonstrated how the integration of both data types can boost predictive performances. Inspired by this, we here present NetMHCpan-4.1 and NetMHCIIpan-4.0, two web servers created to predict binding between peptides and MHC-I and MHC-II, respectively. Both methods exploit tailored machine learning strategies to integrate different training data types, resulting in state-of-the-art performance and outperforming their competitors. The servers are available at http://www.cbs.dtu.dk/services/NetMHCpan-4.1/ and http://www.cbs.dtu.dk/services/NetMHCIIpan-4.0/.Fil: Reynisson, Birkir. Technical University of Denmark; DinamarcaFil: Alvarez, Bruno. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; ArgentinaFil: Paul, Sinu. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Peters, Bjoern. La Jolla Institute for Allergy and Immunology; Estados Unidos. University of California at San Diego; Estados UnidosFil: Nielsen, Morten. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; Argentina. Technical University of Denmark; Dinamarc

    Dataset size and composition impact the reliability of performance benchmarks for peptide-MHC binding predictions

    Get PDF
    BACKGROUND: It is important to accurately determine the performance of peptide:MHC binding predictions, as this enables users to compare and choose between different prediction methods and provides estimates of the expected error rate. Two common approaches to determine prediction performance are cross-validation, in which all available data are iteratively split into training and testing data, and the use of blind sets generated separately from the data used to construct the predictive method. In the present study, we have compared cross-validated prediction performances generated on our last benchmark dataset from 2009 with prediction performances generated on data subsequently added to the Immune Epitope Database (IEDB) which served as a blind set. RESULTS: We found that cross-validated performances systematically overestimated performance on the blind set. This was found not to be due to the presence of similar peptides in the cross-validation dataset. Rather, we found that small size and low sequence/affinity diversity of either training or blind datasets were associated with large differences in cross-validated vs. blind prediction performances. We use these findings to derive quantitative rules of how large and diverse datasets need to be to provide generalizable performance estimates. CONCLUSION: It has long been known that cross-validated prediction performance estimates often overestimate performance on independently generated blind set data. We here identify and quantify the specific factors contributing to this effect for MHC-I binding predictions. An increasing number of peptides for which MHC binding affinities are measured experimentally have been selected based on binding predictions and thus are less diverse than historic datasets sampling the entire sequence and affinity space, making them more difficult benchmark data sets. This has to be taken into account when comparing performance metrics between different benchmarks, and when deriving error estimates for predictions based on benchmark performance.Fil: Kim, Yohan. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Sidney, John. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Buus, Søren. Universidad de Copenhagen; DinamarcaFil: Sette, Alessandro. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Nielsen, Morten. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; Argentina. Technical University of Denmark; DinamarcaFil: Peters, Bjoern. La Jolla Institute for Allergy and Immunology; Estados Unido

    Machine Learning Reveals a Non-Canonical Mode of Peptide Binding to MHC class II Molecules

    Get PDF
    MHC class II molecules play a fundamental role in the cellular immune system: they load short peptide fragments derived from extracellular proteins and present them on the cell surface. It is currently thought that the peptide binds lying more or less flat in the MHC groove, with a fixed distance of nine amino acids between the first and last residue in contact with the MHCII. While confirming that the great majority of peptides bind to the MHC using this canonical mode, we report evidence for an alternative, less common mode of interaction. A fraction of observed ligands were shown to have an unconventional spacing of the anchor residues that directly interact with the MHC, which could only be accommodated to the canonical MHC motif either by imposing a more stretched out peptide backbone (an 8mer core) or by the peptide bulging out of the MHC groove (a 10mer core). We estimated that on average 2% of peptides bind with a core deletion, and 0·45% with a core insertion, but the frequency of such non‐canonical cores was as high as 10% for certain MHCII molecules. A mutational analysis and experimental validation of a number of these anomalous ligands demonstrated that they could only fit to their MHC binding motif with a non‐canonical binding core of length different from nine. This previously undescribed mode of peptide binding to MHCII molecules gives a more complete picture of peptide presentation by MHCII and allows us to model more accurately this event.Fil: Andreatta, Massimo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; ArgentinaFil: Jurtz, Vanessa I.. Technical University of Denmark; DinamarcaFil: Kaever, Thomas. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Sette, Alessandro. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Peters, Bjoern. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Nielsen, Morten. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas; Argentina. Technical University of Denmark; Dinamarc
    corecore