27 research outputs found
Proteome Digestion Specificity Analysis for Rational Design of Extended Bottom-up and Middle-down Proteomics Experiments
Mass spectrometry (MS)-based bottom-up proteomics (BUP) is currently the method of choice for large-scale identification and characterization of proteins present in complex samples, such as cell lysates, body fluids, or tissues. Technically, BUP relies on MS analysis of complex mixtures of small, 15 kDa (TDP). Because of instrumentation-related considerations, we first advocate for the extended BUP approach as the potential near-future improvement of BUP. Therefore, we chose to optimize the number of unique peptides in the 3-7 kDa range while maximizing the number of represented proteins. The present study considers human, yeast, and bacterial proteomes. Results of the study can be further used for designing extended BUP or MDP experimental workflows
Proteome Digestion Specificity Analysis for Rational Design of Extended Bottom-up and Middle-down Proteomics Experiments
Mass spectrometry (MS)-based bottom-up
proteomics (BUP) is currently
the method of choice for large-scale identification and characterization
of proteins present in complex samples, such as cell lysates, body
fluids, or tissues. Technically, BUP relies on MS analysis of complex
mixtures of small, <3 kDa, peptides resulting from whole proteome
digestion. Because of the extremely high sample complexity, further
developments of detection methods and sample preparation techniques
are necessary. In recent years, a number of alternative approaches
such as middle-down proteomics (MDP, addressing up to 15 kDa peptides)
and top-down proteomics (TDP, addressing proteins exceeding 15 kDa)
have been gaining particular interest. Here we report on the bioinformatics
study of both common and less frequently employed digestion procedures
for complex protein mixtures specifically targeting the MDP approach.
The aim of this study was to maximize the yield of protein structure
information from MS data by optimizing peptide size distribution and
sequence specificity. We classified peptides into four categories
based on molecular weight: 0.6â3 (classical BUP), 3â7
(extended BUP), 7â15 kDa (MDP), and >15 kDa (TDP). Because
of instrumentation-related considerations, we first advocate for the
extended BUP approach as the potential near-future improvement of
BUP. Therefore, we chose to optimize the number of unique peptides
in the 3â7 kDa range while maximizing the number of represented
proteins. The present study considers human, yeast, and bacterial
proteomes. Results of the study can be further used for designing
extended BUP or MDP experimental workflows
Unbiased False Discovery Rate Estimation for Shotgun Proteomics Based on the Target-Decoy Approach
Target-decoy
approach (TDA) is the dominant strategy for false
discovery rate (FDR) estimation in mass-spectrometry-based proteomics.
One of its main applications is direct FDR estimation based on counting
of decoy matches above a certain score threshold. The corresponding
equations are widely employed for filtering of peptide or protein
identifications. In this work we consider a probability model describing
the filtering process and find that, when decoy counting is used for <i>q</i> value estimation and subsequent filtering, a correction
has to be introduced into these common equations for TDA-based FDR
estimation. We also discuss the scale of variance of false discovery
proportion (FDP) and propose using confidence intervals for more conservative
FDP estimation in shotgun proteomics. The necessity of both the correction
and the use of confidence intervals is especially pronounced when
filtering small sets (such as in proteogenomics experiments) and when
using very low FDR thresholds
Method for Identification of Threonine Isoforms in Peptides by Ultraviolet Photofragmentation of Cold Ions
Identification of isomeric amino acid residues in peptides and proteins is challenging but often highly desired in proteomics. One of the practically important cases that require isomeric assignments is that associated with single-nucleotide polymorphism substitutions of Met residues by Thr in cancer-related proteins. These genetically encoded substitutions can yet be confused with the chemical modifications, arising from protein alkylation by iodoacetamide, which is commonly used in the standard procedure of sample preparation for proteomic analysis. Similar to the genetically encoded mutations, the alkylation also induces a conversion of methionine residues, but to the iso-threonine form. Recognition of the mutations therefore requires isoform-sensitive detection techniques. Herein, we demonstrate an analytical method for reliable identification of isoforms of threonine residues in tryptic peptides. It is based on ultraviolet photodissociation mass spectrometry of cryogenically cooled ions and a machine-learning algorithm. The measured photodissociation mass spectra exhibit isoform-specific patterns, which are independent of the residues adjacent to threonine or iso-threonine in a peptide sequence. A comprehensive metric-based evaluation demonstrates that, being calibrated with a set of model peptides, the method allows for isomeric identification of threonine residues in peptides of arbitrary sequence
Comparison of False Discovery Rate Control Strategies for Variant Peptide Identifications in Shotgun Proteogenomics
Proteogenomic
studies aiming at identification of variant peptides
using customized database searches of mass spectrometry data are facing
a dilemma of selecting the most efficient database search strategy:
A choice has to be made between using combined or sequential searches
against reference (wild-type) and mutant protein databases or directly
against the mutant database without the wild-type one. Here we called
these approaches âall-togetherâ, âone-by-oneâ,
and âdirectâ, respectively. We share the results of
the comparison of these search strategies obtained for large data
sets of publicly available proteogenomic data. On the basis of the
results of this evaluation, we found that the âall-togetherâ
strategy provided, in general, more variant peptide identifications
compared with the âone-by-oneâ approach, while showing
similar performance for some specific cases. To validate further the
results of this study, we performed a control comparison of the strategies
in question using publicly available data for a mixture of the annotated
human protein standard UPS1 and <i>E. coli</i>. For these
data, both âall-togetherâ and âone-by-oneâ
approaches showed similar sensitivity and specificity of the searches,
while the âdirectâ approach resulted in an increased
number of false identifications
Identification of Alternative Splicing in Proteomes of Human Melanoma Cell Lines without RNA Sequencing Data
Alternative splicing is one of the main regulation pathways in living cells beyond simple changes in the level of protein expression. Most of the approaches proposed in proteomics for the identification of specific splicing isoforms require a preliminary deep transcriptomic analysis of the sample under study, which is not always available, especially in the case of the re-analysis of previously acquired data. Herein, we developed new algorithms for the identification and validation of protein splice isoforms in proteomic data in the absence of RNA sequencing of the samples under study. The bioinformatic approaches were tested on the results of proteome analysis of human melanoma cell lines, obtained earlier by high-resolution liquid chromatography and mass spectrometry (LC-MS). A search for alternative splicing events for each of the cell lines studied was performed against the database generated from all known transcripts (RefSeq) and the one composed of peptide sequences, which included all biologically possible combinations of exons. The identifications were filtered using the prediction of both retention times and relative intensities of fragment ions in the corresponding mass spectra. The fragmentation mass spectra corresponding to the discovered alternative splicing events were additionally examined for artifacts. Selected splicing events were further validated at the mRNA level by quantitative PCR
Identification of Alternative Splicing in Proteomes of Human Melanoma Cell Lines without RNA Sequencing Data
Alternative splicing is one of the main regulation pathways in living cells beyond simple changes in the level of protein expression. Most of the approaches proposed in proteomics for the identification of specific splicing isoforms require a preliminary deep transcriptomic analysis of the sample under study, which is not always available, especially in the case of the re-analysis of previously acquired data. Herein, we developed new algorithms for the identification and validation of protein splice isoforms in proteomic data in the absence of RNA sequencing of the samples under study. The bioinformatic approaches were tested on the results of proteome analysis of human melanoma cell lines, obtained earlier by high-resolution liquid chromatography and mass spectrometry (LC-MS). A search for alternative splicing events for each of the cell lines studied was performed against the database generated from all known transcripts (RefSeq) and the one composed of peptide sequences, which included all biologically possible combinations of exons. The identifications were filtered using the prediction of both retention times and relative intensities of fragment ions in the corresponding mass spectra. The fragmentation mass spectra corresponding to the discovered alternative splicing events were additionally examined for artifacts. Selected splicing events were further validated at the mRNA level by quantitative PCR
Empirical multi-dimensional space for scoring peptide spectrum matches in shotgun proteomics
Data-dependent tandem mass spectrometry (MS/MS) is one of the main techniques for protein identification in shotgun proteomics. In a typical LC MS/MS workflow, peptide product ion mass spectra (MS/MS spectra) are compared with those derived theoretically from a protein sequence database. Scoring of these matches results in peptide identifications. A set of peptide identifications is characterized by false discovery rate (FDR), which determines the fraction of false identifications in the set. The total number of peptides targeted for fragmentation is in the range of 10 000 to 20 000 for a several-hour LC MS/MS run. Typically, <50% of these MS/MS spectra result in peptide-spectrum matches go (PSMs). A small fraction of PSMs pass the preset FDR level (commonly 1%) giving a list of identified proteins, yet a large number of correct PSMs corresponding to the peptides originally present in the sample are left behind in the "grey area" below the identity threshold. Following the numerous efforts to recover these correct PSMs, here we investigate the utility of a scoring scheme based on the multiple PSM descriptors available from the experimental data. These descriptors include retention time, deviation between experimental and theoretical mass, number of missed cleavages upon in-solution protein digestion, precursor ion fraction (PIF), PSM count per sequence, potential modifications, median fragment mass error, C-13 isotope mass difference, charge states, and number of PSMs per protein. The proposed scheme utilizes a set of metrics obtained for the corresponding distributions of each of the descriptors. We found that the proposed PSM scoring algorithm differentiates equally or more efficiently between correct and incorrect identifications compared with existing postsearch validation approaches
Legislative Documents
Also, variously referred to as: House bills; House documents; House legislative documents; legislative documents; General Court documents
Chemical-Mediated Digestion: An Alternative Realm for Middle-down Proteomics?
Protein digestion in mass spectrometry
(MS)-based bottom-up proteomics
targets mainly lysine and arginine residues, yielding primarily 0.6â3
kDa peptides for the proteomes of organisms of all major kingdoms.
Recent advances in MS technology enable analysis of complex mixtures
of increasingly longer (>3 kDa) peptides in a high-throughput manner
supporting the development of a middle-down proteomics (MDP) approach.
Generating longer peptides is a paramount step in launching an MDP
pipeline, but the quest for the selection of a cleaving agent that
would provide the desired 3â15 kDa peptides remains open. Recent
bioinformatics studies have shown that cleavage at the rarely occurring
amino acid residues such as methionine (Met), tryptophan (Trp), or
cysteine (Cys) would be suitable for MDP approach. Interestingly,
chemical-mediated proteolytic cleavages uniquely allow targeting these
rare amino acids, for which no specific proteolytic enzymes are known.
Herein, as potential candidates for MDP-grade proteolysis, we have
investigated the performance of chemical agents previously reported
to target primarily Met, Trp, and Cys residues: CNBr, BNPS-Skatole
(3-bromo-3-methyl-2-(2-nitrophenyl)Âsulfanylindole), and NTCB
(2-nitro-5-thiobenzoic acid), respectively. Figures of merit such
as digestion reproducibility, peptide size distribution, and occurrence
of side reactions are discussed. The NTCB-based MDP workflow has demonstrated
particularly attractive performance, and NTCB is put forward here
as a potential cleaving agent for further MDP development