5,173 research outputs found

    N-terminal proteomics assisted profiling of the unexplored translation initiation landscape in Arabidopsis thaliana

    Get PDF
    Proteogenomics is an emerging research field yet lacking a uniform method of analysis. Proteogenomic studies in which N-terminal proteomics and ribosome profiling are combined, suggest that a high number of protein start sites are currently missing in genome annotations. We constructed a proteogenomic pipeline specific for the analysis of N-terminal proteomics data, with the aim of discovering novel translational start sites outside annotated protein coding regions. In summary, unidentified MS/MS spectra were matched to a specific N-terminal peptide library encompassing protein N termini encoded in the Arabidopsis thaliana genome. After a stringent false discovery rate filtering, 117 protein N termini compliant with N-terminal methionine excision specificity and indicative of translation initiation were found. These include N-terminal protein extensions and translation from transposable elements and pseudogenes. Gene prediction provided supporting protein-coding models for approximately half of the protein N termini. Besides the prediction of functional domains (partially) contained within the newly predicted ORFs, further supporting evidence of translation was found in the recently released Araport11 genome re-annotation of Arabidopsis and computational translations of sequences stored in public repositories. Most interestingly, complementary evidence by ribosome profiling was found for 23 protein N termini. Finally, by analyzing protein N-terminal peptides, an in silico analysis demonstrates the applicability of our N-terminal proteogenomics strategy in revealing protein-coding potential in species with well-and poorly-annotated genomes

    Current challenges in software solutions for mass spectrometry-based quantitative proteomics

    Get PDF
    This work was in part supported by the PRIME-XS project, grant agreement number 262067, funded by the European Union seventh Framework Programme; The Netherlands Proteomics Centre, embedded in The Netherlands Genomics Initiative; The Netherlands Bioinformatics Centre; and the Centre for Biomedical Genetics (to S.C., B.B. and A.J.R.H); by NIH grants NCRR RR001614 and RR019934 (to the UCSF Mass Spectrometry Facility, director: A.L. Burlingame, P.B.); and by grants from the MRC, CR-UK, BBSRC and Barts and the London Charity (to P.C.

    Bacterial riboproteogenomics : the era of N-terminal proteoform existence revealed

    Get PDF
    With the rapid increase in the number of sequenced prokaryotic genomes, relying on automated gene annotation became a necessity. Multiple lines of evidence, however, suggest that current bacterial genome annotations may contain inconsistencies and are incomplete, even for so-called well-annotated genomes. We here discuss underexplored sources of protein diversity and new methodologies for high-throughput genome re-annotation. The expression of multiple molecular forms of proteins (proteoforms) from a single gene, particularly driven by alternative translation initiation, is gaining interest as a prominent contributor to bacterial protein diversity. In consequence, riboproteogenomic pipelines were proposed to comprehensively capture proteoform expression in prokaryotes by the complementary use of (positional) proteomics and the direct readout of translated genomic regions using ribosome profiling. To complement these discoveries, tailored strategies are required for the functional characterization of newly discovered bacterial proteoforms

    DART-ID increases single-cell proteome coverage.

    Get PDF
    Analysis by liquid chromatography and tandem mass spectrometry (LC-MS/MS) can identify and quantify thousands of proteins in microgram-level samples, such as those comprised of thousands of cells. This process, however, remains challenging for smaller samples, such as the proteomes of single mammalian cells, because reduced protein levels reduce the number of confidently sequenced peptides. To alleviate this reduction, we developed Data-driven Alignment of Retention Times for IDentification (DART-ID). DART-ID implements principled Bayesian frameworks for global retention time (RT) alignment and for incorporating RT estimates towards improved confidence estimates of peptide-spectrum-matches. When applied to bulk or to single-cell samples, DART-ID increased the number of data points by 30-50% at 1% FDR, and thus decreased missing data. Benchmarks indicate excellent quantification of peptides upgraded by DART-ID and support their utility for quantitative analysis, such as identifying cell types and cell-type specific proteins. The additional datapoints provided by DART-ID boost the statistical power and double the number of proteins identified as differentially abundant in monocytes and T-cells. DART-ID can be applied to diverse experimental designs and is freely available at http://dart-id.slavovlab.net

    The Drosophila melanogaster PeptideAtlas facilitates the use of peptide data for improved fly proteomics and genome annotation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Crucial foundations of any quantitative systems biology experiment are correct genome and proteome annotations. Protein databases compiled from high quality empirical protein identifications that are in turn based on correct gene models increase the correctness, sensitivity, and quantitative accuracy of systems biology genome-scale experiments.</p> <p>Results</p> <p>In this manuscript, we present the <it>Drosophila melanogaster </it>PeptideAtlas, a fly proteomics and genomics resource of unsurpassed depth. Based on peptide mass spectrometry data collected in our laboratory the portal <url>http://www.drosophila-peptideatlas.org</url> allows querying fly protein data observed with respect to gene model confirmation and splice site verification as well as for the identification of proteotypic peptides suited for targeted proteomics studies. Additionally, the database provides consensus mass spectra for observed peptides along with qualitative and quantitative information about the number of observations of a particular peptide and the sample(s) in which it was observed.</p> <p>Conclusion</p> <p>PeptideAtlas is an open access database for the <it>Drosophila </it>community that has several features and applications that support (1) reduction of the complexity inherently associated with performing targeted proteomic studies, (2) designing and accelerating shotgun proteomics experiments, (3) confirming or questioning gene models, and (4) adjusting gene models such that they are in line with observed <it>Drosophila </it>peptides. While the database consists of proteomic data it is not required that the user is a proteomics expert.</p

    Addressing the needs of traumatic brain injury with clinical proteomics.

    Get PDF
    BackgroundNeurotrauma or injuries to the central nervous system (CNS) are a serious public health problem worldwide. Approximately 75% of all traumatic brain injuries (TBIs) are concussions or other mild TBI (mTBI) forms. Evaluation of concussion injury today is limited to an assessment of behavioral symptoms, often with delay and subject to motivation. Hence, there is an urgent need for an accurate chemical measure in biofluids to serve as a diagnostic tool for invisible brain wounds, to monitor severe patient trajectories, and to predict survival chances. Although a number of neurotrauma marker candidates have been reported, the broad spectrum of TBI limits the significance of small cohort studies. Specificity and sensitivity issues compound the development of a conclusive diagnostic assay, especially for concussion patients. Thus, the neurotrauma field currently has no diagnostic biofluid test in clinical use.ContentWe discuss the challenges of discovering new and validating identified neurotrauma marker candidates using proteomics-based strategies, including targeting, selection strategies and the application of mass spectrometry (MS) technologies and their potential impact to the neurotrauma field.SummaryMany studies use TBI marker candidates based on literature reports, yet progress in genomics and proteomics have started to provide neurotrauma protein profiles. Choosing meaningful marker candidates from such 'long lists' is still pending, as only few can be taken through the process of preclinical verification and large scale translational validation. Quantitative mass spectrometry targeting specific molecules rather than random sampling of the whole proteome, e.g., multiple reaction monitoring (MRM), offers an efficient and effective means to multiplex the measurement of several candidates in patient samples, thereby omitting the need for antibodies prior to clinical assay design. Sample preparation challenges specific to TBI are addressed. A tailored selection strategy combined with a multiplex screening approach is helping to arrive at diagnostically suitable candidates for clinical assay development. A surrogate marker test will be instrumental for critical decisions of TBI patient care and protection of concussion victims from repeated exposures that could result in lasting neurological deficits

    A nonparametric model for quality control of database search results in shotgun proteomics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Analysis of complex samples with tandem mass spectrometry (MS/MS) has become routine in proteomic research. However, validation of database search results creates a bottleneck in MS/MS data processing. Recently, methods based on a randomized database have become popular for quality control of database search results. However, a consequent problem is the ignorance of how to combine different database search scores to improve the sensitivity of randomized database methods.</p> <p>Results</p> <p>In this paper, a multivariate nonlinear discriminate function (DF) based on the multivariate nonparametric density estimation technique was used to filter out false-positive database search results with a predictable false positive rate (FPR). Application of this method to control datasets of different instruments (LCQ, LTQ, and LTQ/FT) yielded an estimated FPR close to the actual FPR. As expected, the method was more sensitive when more features were used. Furthermore, the new method was shown to be more sensitive than two commonly used methods on 3 complex sample datasets and 3 control datasets.</p> <p>Conclusion</p> <p>Using the nonparametric model, a more flexible DF can be obtained, resulting in improved sensitivity and good FPR estimation. This nonparametric statistical technique is a powerful tool for tackling the complexity and diversity of datasets in shotgun proteomics.</p

    Visualization and exploration of next-generation proteomics data

    Get PDF

    The Impact II, a Very High-Resolution Quadrupole Time-of-Flight Instrument (QTOF) for Deep Shotgun Proteomics

    No full text
    Hybrid quadrupole time-of-flight (QTOF) mass spectrometry is one of the two major principles used in proteomics. Although based on simple fundamentals, it has over the last decades greatly evolved in terms of achievable resolution, mass accuracy, and dynamic range. The Bruker impact platform of QTOF instruments takes advantage of these developments and here we develop and evaluate the impact II for shotgun proteomics applications. Adaption of our heated liquid chromatography system achieved very narrow peptide elution peaks. The impact II is equipped with a new collision cell with both axial and radial ion ejection, more than doubling ion extraction at high tandem MS frequencies. The new reflectron and detector improve resolving power compared with the previous model up to 80%, i.e. to 40,000 at m/z 1222. We analyzed the ion current from the inlet capillary and found very high transmission (>80%) up to the collision cell. Simulation and measurement indicated 60% transfer into the flight tube. We adapted MaxQuant for QTOF data, improving absolute average mass deviations to better than 1.45 ppm. More than 4800 proteins can be identified in a single run of HeLa digest in a 90 min gradient. The workflow achieved high technical reproducibility (R2 > 0.99) and accurate fold change determination in spike-in experiments in complex mixtures. Using label-free quantification we rapidly quantified haploid against diploid yeast and characterized overall proteome differences in mouse cell lines originating from different tissues. Finally, after high pH reversed-phase fractionation we identified 9515 proteins in a triplicate measurement of HeLa peptide mixture and 11,257 proteins in single measurements of cerebellum-the highest proteome coverage reported with a QTOF instrument so far
    • …
    corecore