875 research outputs found

    Positional proteomics reveals differences in N-terminal proteoform stability

    Get PDF
    To understand the impact of alternative translation initiation on a proteome, we performed a proteome-wide study on protein turnover using positional proteomics and ribosome profiling to distinguish between N-terminal proteoforms of individual genes. By combining pulsed SILAC with N-terminal COFRADIC, we monitored the stability of 1,941 human N-terminal proteoforms, including 147N-terminal proteoform pairs that originate from alternative translation initiation, alternative splicing or incomplete processing of the initiator methionine. N-terminally truncated proteoforms were less abundant than canonical proteoforms and often displayed altered stabilities, likely attributed to individual protein characteristics, including intrinsic disorder, but independent of N-terminal amino acid identity or truncation length. We discovered that the removal of initiator methionine by methionine aminopeptidases reduced the stability of processed proteoforms, while susceptibility for N-terminal acetylation did not seem to influence protein turnover rates. Taken together, our findings reveal differences in protein stability between N-terminal proteoforms and point to a role for alternative translation initiation and co-translational initiator methionine removal, next to alternative splicing, in the overall regulation of proteome homeostasis

    Mass graphs and their applications in top-down proteomics

    Get PDF
    Although proteomics has made rapid progress in the past decade, researchers are still in the early stage of exploring the world of complex proteoforms, which are protein products with various primary structure alterations resulting from gene mutations, alternative splicing, post-translational modifications, and other biological processes. Proteoform identification is essential to mapping proteoforms to their biological functions as well as discovering novel proteoforms and new protein functions. Top-down mass spectrometry is the method of choice for identifying complex proteoforms because it provides a "bird view" of intact proteoforms. The combinatorial explosion of possible proteoforms, which may result in billions of possible proteoforms for one protein, makes proteoform identification a challenging computational problem. Here we propose a new data structure, called the mass graph, for efficiently representing proteoforms. In addition, we design mass graph alignment algorithms for proteoform identification by top-down mass spectrometry. Experiments on a histone H4 mass spectrometry data set showed that the proposed methods outperformed MS-Align-E in identifying complex proteoforms

    Evaluation of top-down mass spectral identification with homologous protein sequences

    Get PDF
    BACKGROUND: Top-down mass spectrometry has unique advantages in identifying proteoforms with multiple post-translational modifications and/or unknown alterations. Most software tools in this area search top-down mass spectra against a protein sequence database for proteoform identification. When the species studied in a mass spectrometry experiment lacks its proteome sequence database, a homologous protein sequence database can be used for proteoform identification. The accuracy of homologous protein sequences affects the sensitivity of proteoform identification and the accuracy of mass shift localization. RESULTS: We tested TopPIC, a commonly used software tool for top-down mass spectral identification, on a top-down mass spectrometry data set of Escherichia coli K12 MG1655, and evaluated its performance using an Escherichia coli K12 MG1655 proteome database and a homologous protein database. The number of identified spectra with the homologous database was about half of that with the Escherichia coli K12 MG1655 database. We also tested TopPIC on a top-down mass spectrometry data set of human MCF-7 cells and obtained similar results. CONCLUSIONS: Experimental results demonstrated that TopPIC is capable of identifying many proteoform spectrum matches and localizing unknown alterations using homologous protein sequences containing no more than 2 mutations

    N-terminal acetyltransferase Naa40p whereabouts put into N-terminal proteoform perspective

    Get PDF
    The evolutionary conserved N-alpha acetyltransferase Naa40p is among the most selective N-terminal acetyltransferases (NATs) identified to date. Here we identified a conserved N-terminally truncated Naa40p proteoform named Naa40p25 or short Naa40p (Naa40S). Intriguingly, although upon ectopic expression in yeast, both Naa40p proteoforms were capable of restoring N-terminal acetylation of the characterized yeast histone H2A Naa40p substrate, the Naa40p histone H4 substrate remained N-terminally free in human haploid cells specifically deleted for canonical Naa40p27 or 237 amino acid long Naa40p (Naa40L), but expressing Naa40S. Interestingly, human Naa40L and Naa40S displayed differential expression and subcellular localization patterns by exhibiting a principal nuclear and cytoplasmic localization, respectively. Furthermore, Naa40L was shown to be N-terminally myristoylated and to interact with N-myristoyltransferase 1 (NMT1), implicating NMT1 in steering Naa40L nuclear import. Differential interactomics data obtained by biotin-dependent proximity labeling (BioID) further hints to context-dependent roles of Naa40p proteoforms. More specifically, with Naa40S representing the main co-translationally acting actor, the interactome of Naa40L was enriched for nucleolar proteins implicated in ribosome biogenesis and the assembly of ribonucleoprotein particles, overall indicating a proteoform-specific segregation of previously reported Naa40p activities. Finally, the yeast histone variant H2A.Z and the transcriptionally regulatory protein Lge1 were identified as novel Naa40p substrates, expanding the restricted substrate repertoire of Naa40p with two additional members and further confirming Lge1 as being the first redundant yNatA and yNatD substrate identified to date

    Characterization of proteoforms with unknown post-translational modi cations using the MIScore

    Get PDF
    Various proteoforms may be generated from a single gene due to primary structure alterations (PSAs) such as genetic variations, alternative splicing, and post-translational modifications (PTMs). Top-down mass spectrometry is capable of analyzing intact proteins and identifying patterns of multiple PSAs, making it the method of choice for studying complex proteoforms. In top-down proteomics, proteoform identification is often performed by searching tandem mass spectra against a protein sequence database that contains only one reference protein sequence for each gene or transcript variant in a proteome. Because of the incompleteness of the protein database, an identified proteoform may contain unknown PSAs compared with the reference sequence. Proteoform characterization is to identify and localize PSAs in a proteoform. Although many software tools have been proposed for proteoform identification by top-down mass spectrometry, the characterization of proteoforms in identified proteoform–spectrum matches still relies mainly on manual annotation. We propose to use the Modification Identification Score (MIScore), which is based on Bayesian models, to automatically identify and localize PTMs in proteoforms. Experiments showed that the MIScore is accurate in identifying and localizing one or two modifications

    Single-Shot Top-Down Proteomics with Capillary Zone Electrophoresis-Electrospray Ionization-Tandem Mass Spectrometry for Identification of Nearly 600 Escherichia coli Proteoforms

    Get PDF
    Capillary zone electrophoresis-electrospray ionization-tandem mass spectrometry (CZE-ESI-MS/MS) has been recognized as an invaluable platform for top-down proteomics. However, the scale of top-down proteomics using CZE-MS/MS is still limited due to the low loading capacity and narrow separation window of CZE. In this work, for the first time we systematically evaluated the dynamic pH junction method for focusing of intact proteins during CZE-MS. The optimized dynamic pH junction-based CZE-MS/MS approached a 1 μL loading capacity, 90 min separation window, and high peak capacity (∼280) for characterization of an Escherichia coli proteome. The results represent the largest loading capacity and the highest peak capacity of CZE for top-down characterization of complex proteomes. Single-shot CZE-MS/MS identified about 2800 proteoform-spectrum matches, nearly 600 proteoforms, and 200 proteins from the Escherichia coli proteome with spectrum-level false discovery rate (FDR) less than 1%. The number of identified proteoforms in this work is over three times higher than that in previous single-shot CZE-MS/MS studies. Truncations, N-terminal methionine excision, signal peptide removal, and some post-translational modifications including oxidation and acetylation were detected

    ABRF Proteome Informatics Research Group (iPRG) 2016 Study: Inferring Proteoforms from Bottom-up Proteomics Data.

    Get PDF
    This report presents the results from the 2016 Association of Biomolecular Resource Facilities Proteome Informatics Research Group (iPRG) study on proteoform inference and false discovery rate (FDR) estimation from bottom-up proteomics data. For this study, 3 replicate Q Exactive Orbitrap liquid chromatography-tandom mass spectrometry datasets were generated from each of
    • …
    corecore