2 research outputs found
Next Generation Proteomic Pipeline for Chromosome-Based Proteomic Research Using NeXtProt and GENCODE Databases
Human Proteome Project
aims to map all human proteins including
missing proteins as well as proteoforms with post translational modifications,
alternative splicing variants (ASVs), and single amino acid variants
(SAAVs). neXtProt and Ensemble databases are usually used to provide
curated information on human coding genes. However, to find these
proteoforms, we (Chr #11 team) first introduce a streamlined pipeline
using customized and concatenated neXtProt and GENCODE originated
from Ensemble, with controlled false discovery rate (FDR). Because
of large sized databases used in this pipeline, we found more stringent
FDR filtering (0.1% at the peptide level and 1% at the protein level)
to claim novel findings, such as GENCODE ASVs and missing proteins,
from human hippocampus data set (MSV000081385) and ProteomeXchange
(PXD007166). Using our next generation proteomic pipeline (nextPP)
with neXtProt and GENCODE databases, two missing proteins such as
activity-regulated cytoskeleton-associated protein (ARC, Chr 8) and
glutamate receptor ionotropic, kainite 5 (GRIK5, Chr 19) were additionally
identified with two or more unique peptides from human brain tissues.
Additionally, by applying the pipeline to human brain related data
sets such as cortex (PXD000067 and PXD000561), spinal cord, and fetal
brain (PXD000561), seven GENCODE ASVs such as ACTN4β012 (Chr.19),
DPYSL2β005 (Chr.8), MPRIP-003 (Chr.17), NCAM1β013 (Chr.11),
EPB41L1β017 (Chr.20), AGAP1β004 (Chr.2), and CPNE5β005
(Chr.6) were identified from two or more data sets. The identified
peptides of GENCODE ASVs were mapped onto novel exon insertions, alternative
translations at 5β²-untranslated region, or novel protein coding
sequence. Applying the pipeline to male reproductive organ related
data sets, 52 GENCODE ASVs were identified from two testis (PXD000561
and PXD002179) and a spermatozoa (PXD003947) data sets. Four out of
52 GENCODE ASVs such as RAB11FIP5β008 (Chr. 2), RP13β347D8.7β001
(Chr. X), PRDX4β002 (Chr. X), and RP11β666A8.13β001
(Chr. 17) were identified in all of the three samples
Integrated Proteomic Pipeline Using Multiple Search Engines for a Proteogenomic Study with a Controlled Protein False Discovery Rate
In
the Chromosome-Centric Human Proteome Project (C-HPP), false-positive
identification by peptide spectrum matches (PSMs) after database searches
is a major issue for proteogenomic studies using liquid-chromatography
and mass-spectrometry-based large proteomic profiling. Here we developed
a simple strategy for protein identification, with a controlled false
discovery rate (FDR) at the protein level, using an integrated proteomic
pipeline (IPP) that consists of four engrailed steps as follows. First,
using three different search engines, SEQUEST, MASCOT, and MS-GF+,
individual proteomic searches were performed against the neXtProt
database. Second, the search results from the PSMs were combined using
statistical evaluation tools including DTASelect and Percolator. Third,
the peptide search scores were converted into E-scores normalized
using an in-house program. Last, ProteinInferencer was used to filter
the proteins containing two or more peptides with a controlled FDR
of 1.0% at the protein level. Finally, we compared the performance
of the IPP to a conventional proteomic pipeline (CPP) for protein
identification using a controlled FDR of <1% at the protein level.
Using the IPP, a total of 5756 proteins (vs 4453 using the CPP) including
477 alternative splicing variants (vs 182 using the CPP) were identified
from human hippocampal tissue. In addition, a total of 10 missing
proteins (vs 7 using the CPP) were identified with two or more unique
peptides, and their tryptic peptides were validated using MS/MS spectral
pattern from a repository database or their corresponding synthetic
peptides. This study shows that the IPP effectively improved the identification
of proteins, including alternative splicing variants and missing proteins,
in human hippocampal tissues for the C-HPP. All RAW files used in
this study were deposited in ProteomeXchange (PXD000395)