82 research outputs found
Multiaspect Examinations of Possible Alternative Mappings of Identified Variant Peptides: A Case Study on the HEK293 Cell Line
Adopting proteogenomics
approach to validate single nucleotide
variation events by identifying corresponding single amino acid variant
peptides from mass spectrometry (MS)-based proteomics data facilitates
translational and clinical research. Although variant peptides are
usually identified from MS data with a stringent false discovery rate
(FDR), FDR control could fail to eliminate dubious results caused
by several issues; thus, postexamination to eliminate dubious results
is required. However, comprehensive postexaminations of identification
results are still lacking. Therefore, we propose a framework of three
bottom-up levels, peptide–spectrum match, peptide, and variant
event levels, that consists of rigorous 11-aspect examinations from
the MS perspective to further confirm the reliability of variant events.
As a proof of concept and showing feasibility, we demonstrate 11 examinations
on the identified variant peptides from an HEK293 cell line data set,
where various database search strategies were applied to maximize
the number of identified variant PSMs with an FDR <1% for postexaminations.
The results showed that only FDR criterion is insufficient to validate
identified variant peptides and the 11 postexaminations can reveal
low-confidence variant events detected by shotgun proteomics experiments.
Therefore, we suggest that postexaminations of identified variant
events based on the proposed framework are necessary for proteogenomics
studies
Multiaspect Examinations of Possible Alternative Mappings of Identified Variant Peptides: A Case Study on the HEK293 Cell Line
Adopting proteogenomics
approach to validate single nucleotide
variation events by identifying corresponding single amino acid variant
peptides from mass spectrometry (MS)-based proteomics data facilitates
translational and clinical research. Although variant peptides are
usually identified from MS data with a stringent false discovery rate
(FDR), FDR control could fail to eliminate dubious results caused
by several issues; thus, postexamination to eliminate dubious results
is required. However, comprehensive postexaminations of identification
results are still lacking. Therefore, we propose a framework of three
bottom-up levels, peptide–spectrum match, peptide, and variant
event levels, that consists of rigorous 11-aspect examinations from
the MS perspective to further confirm the reliability of variant events.
As a proof of concept and showing feasibility, we demonstrate 11 examinations
on the identified variant peptides from an HEK293 cell line data set,
where various database search strategies were applied to maximize
the number of identified variant PSMs with an FDR <1% for postexaminations.
The results showed that only FDR criterion is insufficient to validate
identified variant peptides and the 11 postexaminations can reveal
low-confidence variant events detected by shotgun proteomics experiments.
Therefore, we suggest that postexaminations of identified variant
events based on the proposed framework are necessary for proteogenomics
studies
MinProtMaxVP: Generating a minimized number of protein variant sequences containing all possible variant peptides by solving a set covering problem
A C# program implemented for the MinProtMaxVP.
Identifying single-amino-acid variants (SAVs) from mass spectrometry-based experiments is critical for validating single-nucleotide variants (SNVs) at protein level to facilitate biomedical research. Currently, two approaches are usually applied to convert SNV annotation into SAV-harboring protein sequences. One approach generates one sequence containing exactly one SAV, and the other all SAVs. Both approaches may neglect the possibility of variant combinations, e.g., haplotypes, existing in bio-samples for sequence generation, thereby rendering low variant identification. Therefore, we propose a novel approach, called MinProtMaxVP, to minimize the SAV-harboring protein sequences generated by considering all combinations of SAVs, but accommodating all variant peptides for identification.
Choong et al., “MinProtMaxVP: Generating a minimized number of protein variant sequences containing all possible variant peptides by solving a set covering problem". (Manuscript submitted), 2019.
About us:
http://ms.iis.sinica.edu.tw/COmics/software.html
</p
MinProtMaxVP: Generating a minimized number of protein variant sequences containing all possible variant peptides by solving a set covering problem
A C# program implemented for the MinProtMaxVP.
Identifying single-amino-acid variants (SAVs) from mass spectrometry-based experiments is critical for validating single-nucleotide variants (SNVs) at protein level to facilitate biomedical research. Currently, two approaches are usually applied to convert SNV annotation into SAV-harboring protein sequences. One approach generates one sequence containing exactly one SAV, and the other all SAVs. Both approaches may neglect the possibility of variant combinations, e.g., haplotypes, existing in bio-samples for sequence generation, thereby rendering low variant identification. Therefore, we propose a novel approach, called MinProtMaxVP, to minimize the SAV-harboring protein sequences generated by considering all combinations of SAVs, but accommodating all variant peptides for identification.
Choong et al., “MinProtMaxVP: Generating a minimized number of protein variant sequences containing all possible variant peptides by solving a set covering problem". (Manuscript submitted), 2019.
About us:
http://ms.iis.sinica.edu.tw/COmics/software.html
</p
<i>i</i>HPDM: In Silico Human Proteome Digestion Map with Proteolytic Peptide Analysis and Graphical Visualizations
When conducting proteomics experiments
to detect missing proteins
and protein isoforms in the human proteome, it is desirable to use
a protease that can yield more unique peptides with properties amenable
for mass spectrometry analysis. Though trypsin is currently the most
widely used protease, some proteins can yield only a limited number
of unique peptides by trypsin digestion. Other proteases and multiple
proteases have been applied in reported studies to increase the number
of identified proteins and protein sequence coverage. To facilitate
the selection of proteases, we developed a web-based resource, called
in silico Human Proteome Digestion Map (iHPDM), which
contains a comprehensive proteolytic peptide database constructed
from human proteins, including isoforms, in neXtProt digested by 15
protease combinations of one or two proteases. iHPDM
provides convenient functions and graphical visualizations for users
to examine and compare the digestion results of different proteases.
Notably, it also supports users to input filtering criteria on digested
peptides, e.g., peptide length and uniqueness, to select suitable
proteases. iHPDM can facilitate protease selection
for shotgun proteomics experiments to identify missing proteins, protein
isoforms, and single amino acid variant peptides
Evaluating the Possibility of Detecting Variants in Shotgun Proteomics via LeTE-Fusion Analysis Pipeline
In
proteogenomic studies, many genome-annotated events, for example,
single amino acid variation (SAAV) and short INDEL, are often unobserved
in shotgun proteomics. Therefore, we propose an analysis pipeline
called LeTE-fusion (Le, peptide length; T, theoretical values; E,
experimental data) to first investigate whether peptides with certain
lengths are observed more often in mass spectrometry (MS)-based proteomics,
which may hinder peptide identification causing difficulty in detecting
genome-annotated events. By applying LeTE-fusion on different MS-based
proteome data sets, we found peptides within 7–20 amino acids
are more frequently identified, possibly attributed to MS-related
factors instead of proteases. We then further extended the usage of
LeTE-fusion on four variant-containing-sequence data sets (SAAV-only)
with various sample complexity up to the whole human proteome scale,
which yields theoretically ∼70% variants observable in an ideal
shotgun proteomics. However, only ∼40% of variants might be
detectable in real shotgun proteomic experiments when LeTE-fusion
utilizes the experimentally observed variant-site-containing wild-type
peptides in PeptideAtlas to estimate the expected observable coverage
of variants. Finally, we conducted a case study on HEK293 cell line
with variants reported at genomic level that were also identified
in shotgun proteomics to demonstrate the efficacy of LeTE-fusion on
estimating expected observable coverage of variants. To the best of
our knowledge, this is the first study to systematically investigate
the detection limits of genome-annotated events via shotgun proteomics
using such analysis pipeline
Decoding the Effect of Isobaric Substitutions on Identifying Missing Proteins and Variant Peptides in Human Proteome
To
confirm the existence of missing proteins, we need to identify
at least two unique peptides with length of 9–40 amino acids
of a missing protein in bottom-up mass-spectrometry-based proteomic
experiments. However, an identified unique peptide of the missing
protein, even identified with high level of confidence, could possibly
coincide with a peptide of a commonly observed protein due to isobaric
substitutions, mass modifications, alternative splice isoforms, or
single amino acid variants (SAAVs). Besides unique peptides of missing
proteins, identified variant peptides (SAAV-containing peptides) could
also alternatively map to peptides of other proteins due to the aforementioned
issues. Therefore, we conducted a thorough comparative analysis on
data sets in PeptideAtlas Tiered Human Integrated Search Proteome
(THISP, 2017-03 release), including neXtProt (2017-01 release), to
systematically investigate the possibility of unique peptides in missing
proteins (PE2–4), unique peptides in dubious proteins, and
variant peptides affected by isobaric substitutions, causing doubtful
identification results. In this study, we considered 11 isobaric substitutions.
From our analysis, we found <5% of the unique peptides of missing
proteins and >6% of variant peptides became shared with peptides
of
PE1 proteins after isobaric substitutions
Accuracies (%) of various predictors on classification of proteins with and without signal peptides.
<p>Accuracies (%) of various predictors on classification of proteins with and without signal peptides.</p
Specificities (%) of various predictors on the non-signal peptide protein benchmark datasets.
<p>Specificities (%) of various predictors on the non-signal peptide protein benchmark datasets.</p
- …
