27 research outputs found

    Proteome Digestion Specificity Analysis for Rational Design of Extended Bottom-up and Middle-down Proteomics Experiments

    No full text
    Mass spectrometry (MS)-based bottom-up proteomics (BUP) is currently the method of choice for large-scale identification and characterization of proteins present in complex samples, such as cell lysates, body fluids, or tissues. Technically, BUP relies on MS analysis of complex mixtures of small, 15 kDa (TDP). Because of instrumentation-related considerations, we first advocate for the extended BUP approach as the potential near-future improvement of BUP. Therefore, we chose to optimize the number of unique peptides in the 3-7 kDa range while maximizing the number of represented proteins. The present study considers human, yeast, and bacterial proteomes. Results of the study can be further used for designing extended BUP or MDP experimental workflows

    Proteome Digestion Specificity Analysis for Rational Design of Extended Bottom-up and Middle-down Proteomics Experiments

    No full text
    Mass spectrometry (MS)-based bottom-up proteomics (BUP) is currently the method of choice for large-scale identification and characterization of proteins present in complex samples, such as cell lysates, body fluids, or tissues. Technically, BUP relies on MS analysis of complex mixtures of small, <3 kDa, peptides resulting from whole proteome digestion. Because of the extremely high sample complexity, further developments of detection methods and sample preparation techniques are necessary. In recent years, a number of alternative approaches such as middle-down proteomics (MDP, addressing up to 15 kDa peptides) and top-down proteomics (TDP, addressing proteins exceeding 15 kDa) have been gaining particular interest. Here we report on the bioinformatics study of both common and less frequently employed digestion procedures for complex protein mixtures specifically targeting the MDP approach. The aim of this study was to maximize the yield of protein structure information from MS data by optimizing peptide size distribution and sequence specificity. We classified peptides into four categories based on molecular weight: 0.6–3 (classical BUP), 3–7 (extended BUP), 7–15 kDa (MDP), and >15 kDa (TDP). Because of instrumentation-related considerations, we first advocate for the extended BUP approach as the potential near-future improvement of BUP. Therefore, we chose to optimize the number of unique peptides in the 3–7 kDa range while maximizing the number of represented proteins. The present study considers human, yeast, and bacterial proteomes. Results of the study can be further used for designing extended BUP or MDP experimental workflows

    Unbiased False Discovery Rate Estimation for Shotgun Proteomics Based on the Target-Decoy Approach

    No full text
    Target-decoy approach (TDA) is the dominant strategy for false discovery rate (FDR) estimation in mass-spectrometry-based proteomics. One of its main applications is direct FDR estimation based on counting of decoy matches above a certain score threshold. The corresponding equations are widely employed for filtering of peptide or protein identifications. In this work we consider a probability model describing the filtering process and find that, when decoy counting is used for <i>q</i> value estimation and subsequent filtering, a correction has to be introduced into these common equations for TDA-based FDR estimation. We also discuss the scale of variance of false discovery proportion (FDP) and propose using confidence intervals for more conservative FDP estimation in shotgun proteomics. The necessity of both the correction and the use of confidence intervals is especially pronounced when filtering small sets (such as in proteogenomics experiments) and when using very low FDR thresholds

    Method for Identification of Threonine Isoforms in Peptides by Ultraviolet Photofragmentation of Cold Ions

    No full text
    Identification of isomeric amino acid residues in peptides and proteins is challenging but often highly desired in proteomics. One of the practically important cases that require isomeric assignments is that associated with single-nucleotide polymorphism substitutions of Met residues by Thr in cancer-related proteins. These genetically encoded substitutions can yet be confused with the chemical modifications, arising from protein alkylation by iodoacetamide, which is commonly used in the standard procedure of sample preparation for proteomic analysis. Similar to the genetically encoded mutations, the alkylation also induces a conversion of methionine residues, but to the iso-threonine form. Recognition of the mutations therefore requires isoform-sensitive detection techniques. Herein, we demonstrate an analytical method for reliable identification of isoforms of threonine residues in tryptic peptides. It is based on ultraviolet photodissociation mass spectrometry of cryogenically cooled ions and a machine-learning algorithm. The measured photodissociation mass spectra exhibit isoform-specific patterns, which are independent of the residues adjacent to threonine or iso-threonine in a peptide sequence. A comprehensive metric-based evaluation demonstrates that, being calibrated with a set of model peptides, the method allows for isomeric identification of threonine residues in peptides of arbitrary sequence

    Comparison of False Discovery Rate Control Strategies for Variant Peptide Identifications in Shotgun Proteogenomics

    No full text
    Proteogenomic studies aiming at identification of variant peptides using customized database searches of mass spectrometry data are facing a dilemma of selecting the most efficient database search strategy: A choice has to be made between using combined or sequential searches against reference (wild-type) and mutant protein databases or directly against the mutant database without the wild-type one. Here we called these approaches “all-together”, “one-by-one”, and “direct”, respectively. We share the results of the comparison of these search strategies obtained for large data sets of publicly available proteogenomic data. On the basis of the results of this evaluation, we found that the “all-together” strategy provided, in general, more variant peptide identifications compared with the “one-by-one” approach, while showing similar performance for some specific cases. To validate further the results of this study, we performed a control comparison of the strategies in question using publicly available data for a mixture of the annotated human protein standard UPS1 and <i>E. coli</i>. For these data, both “all-together” and “one-by-one” approaches showed similar sensitivity and specificity of the searches, while the “direct” approach resulted in an increased number of false identifications

    Identification of Alternative Splicing in Proteomes of Human Melanoma Cell Lines without RNA Sequencing Data

    No full text
    Alternative splicing is one of the main regulation pathways in living cells beyond simple changes in the level of protein expression. Most of the approaches proposed in proteomics for the identification of specific splicing isoforms require a preliminary deep transcriptomic analysis of the sample under study, which is not always available, especially in the case of the re-analysis of previously acquired data. Herein, we developed new algorithms for the identification and validation of protein splice isoforms in proteomic data in the absence of RNA sequencing of the samples under study. The bioinformatic approaches were tested on the results of proteome analysis of human melanoma cell lines, obtained earlier by high-resolution liquid chromatography and mass spectrometry (LC-MS). A search for alternative splicing events for each of the cell lines studied was performed against the database generated from all known transcripts (RefSeq) and the one composed of peptide sequences, which included all biologically possible combinations of exons. The identifications were filtered using the prediction of both retention times and relative intensities of fragment ions in the corresponding mass spectra. The fragmentation mass spectra corresponding to the discovered alternative splicing events were additionally examined for artifacts. Selected splicing events were further validated at the mRNA level by quantitative PCR

    Identification of Alternative Splicing in Proteomes of Human Melanoma Cell Lines without RNA Sequencing Data

    No full text
    Alternative splicing is one of the main regulation pathways in living cells beyond simple changes in the level of protein expression. Most of the approaches proposed in proteomics for the identification of specific splicing isoforms require a preliminary deep transcriptomic analysis of the sample under study, which is not always available, especially in the case of the re-analysis of previously acquired data. Herein, we developed new algorithms for the identification and validation of protein splice isoforms in proteomic data in the absence of RNA sequencing of the samples under study. The bioinformatic approaches were tested on the results of proteome analysis of human melanoma cell lines, obtained earlier by high-resolution liquid chromatography and mass spectrometry (LC-MS). A search for alternative splicing events for each of the cell lines studied was performed against the database generated from all known transcripts (RefSeq) and the one composed of peptide sequences, which included all biologically possible combinations of exons. The identifications were filtered using the prediction of both retention times and relative intensities of fragment ions in the corresponding mass spectra. The fragmentation mass spectra corresponding to the discovered alternative splicing events were additionally examined for artifacts. Selected splicing events were further validated at the mRNA level by quantitative PCR

    Empirical multi-dimensional space for scoring peptide spectrum matches in shotgun proteomics

    No full text
    Data-dependent tandem mass spectrometry (MS/MS) is one of the main techniques for protein identification in shotgun proteomics. In a typical LC MS/MS workflow, peptide product ion mass spectra (MS/MS spectra) are compared with those derived theoretically from a protein sequence database. Scoring of these matches results in peptide identifications. A set of peptide identifications is characterized by false discovery rate (FDR), which determines the fraction of false identifications in the set. The total number of peptides targeted for fragmentation is in the range of 10 000 to 20 000 for a several-hour LC MS/MS run. Typically, <50% of these MS/MS spectra result in peptide-spectrum matches go (PSMs). A small fraction of PSMs pass the preset FDR level (commonly 1%) giving a list of identified proteins, yet a large number of correct PSMs corresponding to the peptides originally present in the sample are left behind in the "grey area" below the identity threshold. Following the numerous efforts to recover these correct PSMs, here we investigate the utility of a scoring scheme based on the multiple PSM descriptors available from the experimental data. These descriptors include retention time, deviation between experimental and theoretical mass, number of missed cleavages upon in-solution protein digestion, precursor ion fraction (PIF), PSM count per sequence, potential modifications, median fragment mass error, C-13 isotope mass difference, charge states, and number of PSMs per protein. The proposed scheme utilizes a set of metrics obtained for the corresponding distributions of each of the descriptors. We found that the proposed PSM scoring algorithm differentiates equally or more efficiently between correct and incorrect identifications compared with existing postsearch validation approaches

    Legislative Documents

    No full text
    Also, variously referred to as: House bills; House documents; House legislative documents; legislative documents; General Court documents

    Chemical-Mediated Digestion: An Alternative Realm for Middle-down Proteomics?

    No full text
    Protein digestion in mass spectrometry (MS)-based bottom-up proteomics targets mainly lysine and arginine residues, yielding primarily 0.6–3 kDa peptides for the proteomes of organisms of all major kingdoms. Recent advances in MS technology enable analysis of complex mixtures of increasingly longer (>3 kDa) peptides in a high-throughput manner supporting the development of a middle-down proteomics (MDP) approach. Generating longer peptides is a paramount step in launching an MDP pipeline, but the quest for the selection of a cleaving agent that would provide the desired 3–15 kDa peptides remains open. Recent bioinformatics studies have shown that cleavage at the rarely occurring amino acid residues such as methionine (Met), tryptophan (Trp), or cysteine (Cys) would be suitable for MDP approach. Interestingly, chemical-mediated proteolytic cleavages uniquely allow targeting these rare amino acids, for which no specific proteolytic enzymes are known. Herein, as potential candidates for MDP-grade proteolysis, we have investigated the performance of chemical agents previously reported to target primarily Met, Trp, and Cys residues: CNBr, BNPS-Skatole (3-bromo-3-methyl-2-(2-nitrophenyl)­sulfanylindole), and NTCB (2-nitro-5-thiobenzoic acid), respectively. Figures of merit such as digestion reproducibility, peptide size distribution, and occurrence of side reactions are discussed. The NTCB-based MDP workflow has demonstrated particularly attractive performance, and NTCB is put forward here as a potential cleaving agent for further MDP development
    corecore