4 research outputs found
Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate
In
the Chromosome-Centric Human Proteome Project (C-HPP), false-positive
identification by peptide spectrum matches (PSMs) after database searches
is a major issue for proteogenomic studies using liquid-chromatography
and mass-spectrometry-based large proteomic profiling. Here we developed
a simple strategy for protein identification, with a controlled false
discovery rate (FDR) at the protein level, using an integrated proteomic
pipeline (IPP) that consists of four engrailed steps as follows. First,
using three different search engines, SEQUEST, MASCOT, and MS-GF+,
individual proteomic searches were performed against the neXtProt
database. Second, the search results from the PSMs were combined using
statistical evaluation tools including DTASelect and Percolator. Third,
the peptide search scores were converted into E-scores normalized
using an in-house program. Last, ProteinInferencer was used to filter
the proteins containing two or more peptides with a controlled FDR
of 1.0% at the protein level. Finally, we compared the performance
of the IPP to a conventional proteomic pipeline (CPP) for protein
identification using a controlled FDR of <1% at the protein level.
Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including
477 alternative splicing variants (vs 182 using the CPP) were identified
from human hippocampal tissue. In addition, a total of 10 missing
proteins (vs 7 using the CPP) were identified with two or more unique
peptides, and their tryptic peptides were validated using MS/MS spectral
pattern from a repository database or their corresponding synthetic
peptides. This study shows that the IPP effectively improved the identification
of proteins, including alternative splicing variants and missing proteins,
in human hippocampal tissues for the C-HPP. All RAW files used in
this study were deposited in ProteomeXchange (PXD000395)
Chromosome-Based Proteomic Study for Identifying Novel Protein Variants from Human Hippocampal Tissue Using Customized neXtProt and GENCODE Databases
The goal of the Chromosome-Centric
Human Proteome Project (C-HPP)
is to fully provide proteomic information from each human chromosome,
including novel proteoforms, such as novel protein-coding variants
expressed from noncoding genomic regions, alternative splicing variants
(ASVs), and single amino acid variants (SAAVs). In the 144 LC/MS/MS
raw files from human hippocampal tissues of control, epilepsy, and
Alzheimer’s disease, we identified the novel proteoforms with
a workflow including integrated proteomic pipeline using three different
search engines, MASCOT, SEQUEST, and MS-GF+. With a <1% false discovery
rate (FDR) at the protein level, the 11 detected peptides mapped to
four translated long noncoding RNA variants against the customized
databases of GENCODE lncRNA, which also mapped to coding-proteins
at different chromosomal sites. We also identified four novel ASVs
against the customized databases of GENCODE transcript. The target
peptides from the variants were validated by tandem MS fragmentation
pattern from their corresponding synthetic peptides. Additionally,
a total of 128 SAAVs paired with their wild-type peptides were identified
with FDR <1% at the peptide level using a customized database from
neXtProt including nonsynonymous single nucleotide polymorphism (nsSNP)
information. Among these results, several novel variants related in
neuro-degenerative disease were identified using the workflow that
could be applicable to C-HPP studies. All raw files used in this study
were deposited in ProteomeXchange (PXD000395)
Chromosome-Based Proteomic Study for Identifying Novel Protein Variants from Human Hippocampal Tissue Using Customized neXtProt and GENCODE Databases
The goal of the Chromosome-Centric
Human Proteome Project (C-HPP)
is to fully provide proteomic information from each human chromosome,
including novel proteoforms, such as novel protein-coding variants
expressed from noncoding genomic regions, alternative splicing variants
(ASVs), and single amino acid variants (SAAVs). In the 144 LC/MS/MS
raw files from human hippocampal tissues of control, epilepsy, and
Alzheimer’s disease, we identified the novel proteoforms with
a workflow including integrated proteomic pipeline using three different
search engines, MASCOT, SEQUEST, and MS-GF+. With a <1% false discovery
rate (FDR) at the protein level, the 11 detected peptides mapped to
four translated long noncoding RNA variants against the customized
databases of GENCODE lncRNA, which also mapped to coding-proteins
at different chromosomal sites. We also identified four novel ASVs
against the customized databases of GENCODE transcript. The target
peptides from the variants were validated by tandem MS fragmentation
pattern from their corresponding synthetic peptides. Additionally,
a total of 128 SAAVs paired with their wild-type peptides were identified
with FDR <1% at the peptide level using a customized database from
neXtProt including nonsynonymous single nucleotide polymorphism (nsSNP)
information. Among these results, several novel variants related in
neuro-degenerative disease were identified using the workflow that
could be applicable to C-HPP studies. All raw files used in this study
were deposited in ProteomeXchange (PXD000395)
Characterization of Site-Specific <i>N</i>‑Glycopeptide Isoforms of α‑1-Acid Glycoprotein from an Interlaboratory Study Using LC–MS/MS
Glycoprotein conformations are complex and heterogeneous. Currently,
site-specific characterization of glycopeptides is a challenge. We
sought to establish an efficient method of <i>N</i>-glycoprotein
characterization using mass spectrometry (MS). Using alpha-1-acid
glycoprotein (AGP) as a model <i>N</i>-glycoprotein, we
identified its tryptic <i>N</i>-glycopeptides and examined
the data reproducibility in seven laboratories running different LC–MS/MS
platforms. We used three test samples and one blind sample to evaluate
instrument performance with entire sample preparation workflow. 165
site-specific <i>N</i>-glycopeptides representative of all <i>N</i>-glycosylation sites were identified from AGP 1 and AGP 2 isoforms. The glycopeptide fragmentations by collision-induced dissociation or higher-energy collisional dissociation (HCD) varied based on the MS analyzer. Orbitrap Elite identified the greatest number of AGP <i>N</i>-glycopeptides, followed by Triple TOF and Q-Exactive Plus. Reproducible generation of oxonium ions, glycan-cleaved glycopeptide fragment ions, and peptide backbone fragment ions was essential for successful identification. Laboratory proficiency affected the number of identified <i>N</i>-glycopeptides. The relative quantities of the 10 major <i>N</i>-glycopeptide isoforms of AGP detected in four laboratories were compared to assess reproducibility. Quantitative analysis showed that the coefficient of variation was <25% for all test samples. Our analytical protocol yielded identification and quantification of site-specific <i>N</i>-glycopeptide isoforms of AGP from control and disease plasma sample