4 research outputs found

    Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate

    No full text
    In the Chromosome-Centric Human Proteome Project (C-HPP), false-positive identification by peptide spectrum matches (PSMs) after database searches is a major issue for proteogenomic studies using liquid-chromatography and mass-spectrometry-based large proteomic profiling. Here we developed a simple strategy for protein identification, with a controlled false discovery rate (FDR) at the protein level, using an integrated proteomic pipeline (IPP) that consists of four engrailed steps as follows. First, using three different search engines, SEQUEST, MASCOT, and MS-GF+, individual proteomic searches were performed against the neXtProt database. Second, the search results from the PSMs were combined using statistical evaluation tools including DTASelect and Percolator. Third, the peptide search scores were converted into E-scores normalized using an in-house program. Last, ProteinInferencer was used to filter the proteins containing two or more peptides with a controlled FDR of 1.0% at the protein level. Finally, we compared the performance of the IPP to a conventional proteomic pipeline (CPP) for protein identification using a controlled FDR of <1% at the protein level. Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including 477 alternative splicing variants (vs 182 using the CPP) were identified from human hippocampal tissue. In addition, a total of 10 missing proteins (vs 7 using the CPP) were identified with two or more unique peptides, and their tryptic peptides were validated using MS/MS spectral pattern from a repository database or their corresponding synthetic peptides. This study shows that the IPP effectively improved the identification of proteins, including alternative splicing variants and missing proteins, in human hippocampal tissues for the C-HPP. All RAW files used in this study were deposited in ProteomeXchange (PXD000395)

    Chromosome-Based Proteomic Study for Identifying Novel Protein Variants from Human Hippocampal Tissue Using Customized neXtProt and GENCODE Databases

    No full text
    The goal of the Chromosome-Centric Human Proteome Project (C-HPP) is to fully provide proteomic information from each human chromosome, including novel proteoforms, such as novel protein-coding variants expressed from noncoding genomic regions, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). In the 144 LC/MS/MS raw files from human hippocampal tissues of control, epilepsy, and Alzheimer’s disease, we identified the novel proteoforms with a workflow including integrated proteomic pipeline using three different search engines, MASCOT, SEQUEST, and MS-GF+. With a <1% false discovery rate (FDR) at the protein level, the 11 detected peptides mapped to four translated long noncoding RNA variants against the customized databases of GENCODE lncRNA, which also mapped to coding-proteins at different chromosomal sites. We also identified four novel ASVs against the customized databases of GENCODE transcript. The target peptides from the variants were validated by tandem MS fragmentation pattern from their corresponding synthetic peptides. Additionally, a total of 128 SAAVs paired with their wild-type peptides were identified with FDR <1% at the peptide level using a customized database from neXtProt including nonsynonymous single nucleotide polymorphism (nsSNP) information. Among these results, several novel variants related in neuro-degenerative disease were identified using the workflow that could be applicable to C-HPP studies. All raw files used in this study were deposited in ProteomeXchange (PXD000395)

    Chromosome-Based Proteomic Study for Identifying Novel Protein Variants from Human Hippocampal Tissue Using Customized neXtProt and GENCODE Databases

    No full text
    The goal of the Chromosome-Centric Human Proteome Project (C-HPP) is to fully provide proteomic information from each human chromosome, including novel proteoforms, such as novel protein-coding variants expressed from noncoding genomic regions, alternative splicing variants (ASVs), and single amino acid variants (SAAVs). In the 144 LC/MS/MS raw files from human hippocampal tissues of control, epilepsy, and Alzheimer’s disease, we identified the novel proteoforms with a workflow including integrated proteomic pipeline using three different search engines, MASCOT, SEQUEST, and MS-GF+. With a <1% false discovery rate (FDR) at the protein level, the 11 detected peptides mapped to four translated long noncoding RNA variants against the customized databases of GENCODE lncRNA, which also mapped to coding-proteins at different chromosomal sites. We also identified four novel ASVs against the customized databases of GENCODE transcript. The target peptides from the variants were validated by tandem MS fragmentation pattern from their corresponding synthetic peptides. Additionally, a total of 128 SAAVs paired with their wild-type peptides were identified with FDR <1% at the peptide level using a customized database from neXtProt including nonsynonymous single nucleotide polymorphism (nsSNP) information. Among these results, several novel variants related in neuro-degenerative disease were identified using the workflow that could be applicable to C-HPP studies. All raw files used in this study were deposited in ProteomeXchange (PXD000395)

    Characterization of Site-Specific <i>N</i>‑Glycopeptide Isoforms of α‑1-Acid Glycoprotein from an Interlaboratory Study Using LC–MS/MS

    No full text
    Glycoprotein conformations are complex and heterogeneous. Currently, site-specific characterization of glycopeptides is a challenge. We sought to establish an efficient method of <i>N</i>-glycoprotein characterization using mass spectrometry (MS). Using alpha-1-acid glycoprotein (AGP) as a model <i>N</i>-glycoprotein, we identified its tryptic <i>N</i>-glycopeptides and examined the data reproducibility in seven laboratories running different LC–MS/MS platforms. We used three test samples and one blind sample to evaluate instrument performance with entire sample preparation workflow. 165 site-specific <i>N</i>-glycopeptides representative of all <i>N</i>-glycosylation sites were identified from AGP 1 and AGP 2 isoforms. The glycopeptide fragmentations by collision-induced dissociation or higher-energy collisional dissociation (HCD) varied based on the MS analyzer. Orbitrap Elite identified the greatest number of AGP <i>N</i>-glycopeptides, followed by Triple TOF and Q-Exactive Plus. Reproducible generation of oxonium ions, glycan-cleaved glycopeptide fragment ions, and peptide backbone fragment ions was essential for successful identification. Laboratory proficiency affected the number of identified <i>N</i>-glycopeptides. The relative quantities of the 10 major <i>N</i>-glycopeptide isoforms of AGP detected in four laboratories were compared to assess reproducibility. Quantitative analysis showed that the coefficient of variation was <25% for all test samples. Our analytical protocol yielded identification and quantification of site-specific <i>N</i>-glycopeptide isoforms of AGP from control and disease plasma sample
    corecore