64 research outputs found

    GibbsCluster: unsupervised clustering and alignment of peptide sequences

    Get PDF
    Receptor interactions with short linear peptide fragments (ligands) are at the base of many biological signaling processes. Conserved and information-rich amino acid patterns, commonly called sequence motifs, shape and regulate these interactions. Because of the properties of a receptor-ligand system or of the assay used to interrogate it, experimental data often contain multiple sequence motifs. GibbsCluster is a powerful tool for unsupervised motif discovery because it can simultaneously cluster and align peptide data. The GibbsCluster 2.0 presented here is an improved version incorporating insertion and deletions accounting for variations in motif length in the peptide input. In basic terms, the program takes as input a set of peptide sequences and clusters them into meaningful groups. It returns the optimal number of clusters it identified, together with the sequence alignment and sequence motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large-scale peptidome data generated by mass spectrometry.Fil: Andreatta, Massimo. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - La Plata. Instituto de Investigaciones BiotecnolĂłgicas. Instituto de Investigaciones BiotecnolĂłgicas "Dr. RaĂșl AlfonsĂ­n" (sede ChascomĂșs). Universidad Nacional de San MartĂ­n. Instituto de Investigaciones BiotecnolĂłgicas. Instituto de Investigaciones BiotecnolĂłgicas "Dr. RaĂșl AlfonsĂ­n" (sede ChascomĂșs); ArgentinaFil: Alvarez, Bruno. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - La Plata. Instituto de Investigaciones BiotecnolĂłgicas. Instituto de Investigaciones BiotecnolĂłgicas "Dr. RaĂșl AlfonsĂ­n" (sede ChascomĂșs). Universidad Nacional de San MartĂ­n. Instituto de Investigaciones BiotecnolĂłgicas. Instituto de Investigaciones BiotecnolĂłgicas "Dr. RaĂșl AlfonsĂ­n" (sede ChascomĂșs); ArgentinaFil: Nielsen, Morten. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - La Plata. Instituto de Investigaciones BiotecnolĂłgicas. Instituto de Investigaciones BiotecnolĂłgicas "Dr. RaĂșl AlfonsĂ­n" (sede ChascomĂșs). Universidad Nacional de San MartĂ­n. Instituto de Investigaciones BiotecnolĂłgicas. Instituto de Investigaciones BiotecnolĂłgicas "Dr. RaĂșl AlfonsĂ­n" (sede ChascomĂșs); Argentina. Technical University of Denmark; Dinamarc

    Discovering sequence motifs in quantitative and qualitative pepetide data

    Get PDF

    NetMHCpan-3.0; improved prediction of binding to MHC class I molecules integrating information from multiple receptor and peptide length datasets

    Get PDF
    Allele-specific length preference for 24 MHC molecules characterized by 20 or more ligand data points for the allmer and 9mer prediction methods compared to the length preference in the SYFPEITHI data. Length profiles for the allmer and 9mer methods were estimated as described in the text. (XLSX 50 kb

    In Silico Prediction of Human Pathogenicity in the gamma-Proteobacteria

    Get PDF
    BACKGROUND: Although the majority of bacteria are innocuous or even beneficial for their host, others are highly infectious pathogens that can cause widespread and deadly diseases. When investigating the relationships between bacteria and other living organisms, it is therefore essential to be able to separate pathogenic organisms from non-pathogenic ones. Using traditional experimental methods for this purpose can be very costly and time-consuming, and also uncertain since animal models are not always good predictors for pathogenicity in humans. Bioinformatics-based methods are therefore strongly needed to mine the fast growing number of genome sequences and assess in a rapid and reliable way the pathogenicity of novel bacteria. METHODOLOGY/PRINCIPAL FINDINGS: We describe a new in silico method for the prediction of bacterial pathogenicity, based on the identification in microbial genomes of features that appear to correlate with virulence. The method does not rely on identifying genes known to be involved in pathogenicity (for instance virulence factors), but rather it inherently builds families of proteins that, irrespective of their function, are consistently present in only one of the two kinds of organisms, pathogens or non-pathogens. Whether a new bacterium carries proteins contained in these families determines its prediction as pathogenic or non-pathogenic. The application of the method on a set of known genomes correctly classified the virulence potential of 86% of the organisms tested. An additional validation on an independent test-set assigned correctly 22 out of 24 bacteria. CONCLUSIONS: The proposed approach was demonstrated to go beyond the species bias imposed by evolutionary relatedness, and performs better than predictors based solely on taxonomy or sequence similarity. A set of protein families that differentiate pathogenic and non-pathogenic strains were identified, including families of yet uncharacterized proteins that are suggested to be involved in bacterial pathogenicity

    Accurate pan-specific prediction of peptide-MHC class II binding affinity with improved binding core identification

    Get PDF
    A key event in the generation of a cellular response against malicious organisms through the endocytic pathway is binding of peptidic antigens by major histocompatibility complex class II (MHC class II) molecules. The bound peptide is then presented on the cell surface where it can be recognized by T helper lymphocytes. NetMHCIIpan is a state-of-the-art method for the quantitative prediction of peptide binding to any human or mouse MHC class II molecule of known sequence. In this paper, we describe an updated version of the method with improved peptide binding register identification. Binding register prediction is concerned with determining the minimal core region of nine residues directly in contact with the MHC binding cleft, a crucial piece of information both for the identification and design of CD4+ T cell antigens. When applied to a set of 51 crystal structures of peptide-MHC complexes with known binding registers, the new method NetMHCIIpan-3.1 significantly outperformed the earlier 3.0 version. We illustrate the impact of accurate binding core identification for the interpretation of T cell cross-reactivity using tetramer double staining with a CMV epitope and its variants mapped to the epitope binding core. NetMHCIIpan is publicly available at http://www.cbs.dtu.dk/services/NetMHCIIpan-3.1.Fil: Andreatta, Massimo. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - La Plata. Instituto de Investigaciones BiotecnolĂłgicas. Instituto de Investigaciones BiotecnolĂłgicas (subsede ChascomĂșs) | Universidad Nacional de San MartĂ­n. Instituto de Investigaciones BiotecnolĂłgicas. Instituto de Investigaciones BiotecnolĂłgicas (subsede ChascomĂșs); ArgentinaFil: Karosiene, Edita. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Rasmussen, Michael. Universidad de Copenhagen; DinamarcaFil: Stryhn, Anette. Universidad de Copenhagen; DinamarcaFil: Buus, SĂžren. Universidad de Copenhagen; DinamarcaFil: Nielsen, Morten. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - La Plata. Instituto de Investigaciones BiotecnolĂłgicas. Instituto de Investigaciones BiotecnolĂłgicas (subsede ChascomĂșs) | Universidad Nacional de San MartĂ­n. Instituto de Investigaciones BiotecnolĂłgicas. Instituto de Investigaciones BiotecnolĂłgicas (subsede ChascomĂșs); Argentina. Technical University of Denmark; Dinamarc

    Machine Learning Reveals a Non-Canonical Mode of Peptide Binding to MHC class II Molecules

    Get PDF
    MHC class II molecules play a fundamental role in the cellular immune system: they load short peptide fragments derived from extracellular proteins and present them on the cell surface. It is currently thought that the peptide binds lying more or less flat in the MHC groove, with a fixed distance of nine amino acids between the first and last residue in contact with the MHCII. While confirming that the great majority of peptides bind to the MHC using this canonical mode, we report evidence for an alternative, less common mode of interaction. A fraction of observed ligands were shown to have an unconventional spacing of the anchor residues that directly interact with the MHC, which could only be accommodated to the canonical MHC motif either by imposing a more stretched out peptide backbone (an 8mer core) or by the peptide bulging out of the MHC groove (a 10mer core). We estimated that on average 2% of peptides bind with a core deletion, and 0·45% with a core insertion, but the frequency of such non‐canonical cores was as high as 10% for certain MHCII molecules. A mutational analysis and experimental validation of a number of these anomalous ligands demonstrated that they could only fit to their MHC binding motif with a non‐canonical binding core of length different from nine. This previously undescribed mode of peptide binding to MHCII molecules gives a more complete picture of peptide presentation by MHCII and allows us to model more accurately this event.Fil: Andreatta, Massimo. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - La Plata. Instituto de Investigaciones BiotecnolĂłgicas. Universidad Nacional de San MartĂ­n. Instituto de Investigaciones BiotecnolĂłgicas; ArgentinaFil: Jurtz, Vanessa I.. Technical University of Denmark; DinamarcaFil: Kaever, Thomas. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Sette, Alessandro. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Peters, Bjoern. La Jolla Institute for Allergy and Immunology; Estados UnidosFil: Nielsen, Morten. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - La Plata. Instituto de Investigaciones BiotecnolĂłgicas. Universidad Nacional de San MartĂ­n. Instituto de Investigaciones BiotecnolĂłgicas; Argentina. Technical University of Denmark; Dinamarc

    NNAlign: A Web-Based Prediction Method Allowing Non-Expert End-User Discovery of Sequence Motifs in Quantitative Peptide Data

    Get PDF
    Recent advances in high-throughput technologies have made it possible to generate both gene and protein sequence data at an unprecedented rate and scale thereby enabling entirely new “omics”-based approaches towards the analysis of complex biological processes. However, the amount and complexity of data that even a single experiment can produce seriously challenges researchers with limited bioinformatics expertise, who need to handle, analyze and interpret the data before it can be understood in a biological context. Thus, there is an unmet need for tools allowing non-bioinformatics users to interpret large data sets. We have recently developed a method, NNAlign, which is generally applicable to any biological problem where quantitative peptide data is available. This method efficiently identifies underlying sequence patterns by simultaneously aligning peptide sequences and identifying motifs associated with quantitative readouts. Here, we provide a web-based implementation of NNAlign allowing non-expert end-users to submit their data (optionally adjusting method parameters), and in return receive a trained method (including a visual representation of the identified motif) that subsequently can be used as prediction method and applied to unknown proteins/peptides. We have successfully applied this method to several different data sets including peptide microarray-derived sets containing more than 100,000 data points

    Gapped sequence alignment using artificial neural networks: application to the MHC class I system

    Get PDF
    Motivation: Many biological processes are guided by receptor interactions with linear ligands of variable length. One such receptor is the MHC class I molecule. The length preferences vary depending on the MHC allele, but are generally limited to peptides of length 8–11 amino acids. On this relatively simple system, we developed a sequence alignment method based on artificial neural networks that allows insertions and deletions in the alignment. Results: We show that prediction methods based on alignments that include insertions and deletions have significantly higher performance than methods trained on peptides of single lengths. Also, we illustrate how the location of deletions can aid the interpretation of the modes of binding of the peptide-MHC, as in the case of long peptides bulging out of the MHC groove or protruding at either terminus. Finally, we demonstrate that the method can learn the length profile of different MHC molecules, and quantified the reduction of the experimental effort required to identify potential epitopes using our prediction algorithm. Availability and implementation: The NetMHC-4.0 method for the prediction of peptide-MHC class I binding affinity using gapped sequence alignment is publicly available at: http://www.cbs.dtu.dk/ services/NetMHC-4.0.Fil: Andreatta, Massimo. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - La Plata. Instituto de Investigaciones BiotecnolĂłgicas. Universidad Nacional de San MartĂ­n. Instituto de Investigaciones BiotecnolĂłgicas; ArgentinaFil: Nielsen, Morten. Technical University of Denmark; Dinamarca. Consejo Nacional de Investigaciones CientĂ­ficas y TĂ©cnicas. Centro CientĂ­fico TecnolĂłgico Conicet - La Plata. Instituto de Investigaciones BiotecnolĂłgicas. Universidad Nacional de San MartĂ­n. Instituto de Investigaciones BiotecnolĂłgicas; Argentin

    Bioprospecting of Artemisia genus: from artemisinin to other potentially bioactive compounds

    Get PDF
    : Species from genus Artemisia are widely distributed throughout temperate regions of the northern hemisphere and many cultures have a long-standing traditional use of these plants as herbal remedies, liquors, cosmetics, spices, etc. Nowadays, the discovery of new plant-derived products to be used as food supplements or drugs has been pushed by the exploitation of bioprospection approaches. Often driven by the knowledge derived from the ethnobotanical use of plants, bioprospection explores the existing biodiversity through integration of modern omics techniques with targeted bioactivity assays. In this work we set up a bioprospection plan to investigate the phytochemical diversity and the potential bioactivity of five Artemisia species with recognized ethnobotanical tradition (A. absinthium, A. alba, A. annua, A. verlotiorum and A. vulgaris), growing wild in the natural areas of the Verona province. We characterized the specialized metabolomes of the species (including sesquiterpenoids from the artemisinin biosynthesis pathway) through an LC-MS based untargeted approach and, in order to identify potential bioactive metabolites, we correlated their composition with the in vitro antioxidant activity. We propose as potential bioactive compounds several isomers of caffeoyl and feruloyl quinic acid esters (e.g. dicaffeoylquinic acids, feruloylquinic acids and caffeoylferuloylquinic acids), which strongly characterize the most antioxidant species A. verlotiorum and A. annua. Morevoer, in this study we report for the first time the occurrence of sesquiterpenoids from the artemisinin biosynthesis pathway in the species A. alba
    • 

    corecore