2,195 research outputs found

    Current challenges in software solutions for mass spectrometry-based quantitative proteomics

    Get PDF
    This work was in part supported by the PRIME-XS project, grant agreement number 262067, funded by the European Union seventh Framework Programme; The Netherlands Proteomics Centre, embedded in The Netherlands Genomics Initiative; The Netherlands Bioinformatics Centre; and the Centre for Biomedical Genetics (to S.C., B.B. and A.J.R.H); by NIH grants NCRR RR001614 and RR019934 (to the UCSF Mass Spectrometry Facility, director: A.L. Burlingame, P.B.); and by grants from the MRC, CR-UK, BBSRC and Barts and the London Charity (to P.C.

    Incorporating peak grouping information for alignment of multiple liquid chromatography-mass spectrometry datasets

    Get PDF
    Motivation: The combination of liquid chromatography and mass spectrometry (LC/MS) has been widely used for large-scale comparative studies in systems biology, including proteomics, glycomics and metabolomics. In almost all experimental design, it is necessary to compare chromatograms across biological or technical replicates and across sample groups. Central to this is the peak alignment step, which is one of the most important but challenging preprocessing steps. Existing alignment tools do not take into account the structural dependencies between related peaks that co-elute and are derived from the same metabolite or peptide. We propose a direct matching peak alignment method for LC/MS data that incorporates related peaks information (within each LC/MS run) and investigate its effect on alignment performance (across runs). The groupings of related peaks necessary for our method can be obtained from any peak clustering method and are built into a pairwise peak similarity score function. The similarity score matrix produced is used by an approximation algorithm for the weighted matching problem to produce the actual alignment result.<p></p> Results: We demonstrate that related peak information can improve alignment performance. The performance is evaluated on a set of benchmark datasets, where our method performs competitively compared to other popular alignment tools.<p></p> Availability: The proposed alignment method has been implemented as a stand-alone application in Python, available for download at http://github.com/joewandy/peak-grouping-alignment.<p></p&gt

    Algorithms for Characterizing Peptides and Glycopeptides with Mass Spectrometry

    Get PDF
    The emergence of tandem mass spectrometry (MS/MS) technology has significantly accelerated protein identification and quantification in proteomics. It enables high-throughput analysis of proteins and their quantities in a complex protein mixture. A mass spectrometer can easily and rapidly generate large volumes of mass spectral data for a biological sample. This bulk of data makes manual interpretation impossible and has also brought numerous challenges in automated data analysis. Algorithmic solutions have been proposed and provide indispensable analytical support in current proteomic experiments. However, new algorithms are still needed to either improve result accuracy or provide additional data analysis capabilities for both protein identification and quantification. Accurate identification of proteins in a sample is the preliminary requirement of a proteomic study. In many cases, a mass spectrum cannot provide complete information to identify the peptide without ambiguity because of the inefficiency of the peptide fragmentation technique and the prevalent existence of noise. We propose ADEPTS to this problem using the complementary information provided in different types of mass spectra. Meanwhile, the occurrence of posttranslational modifications (PTMs) on proteins is another major issue that prevents the interpretation of a large portion of spectra. Using current software tools, users have to specify possible PTMs in advance. However, the number of possible PTMs has to be limited since specifying more PTMs to the software leads to a longer running time and lower result accuracy. Thus, we develop DeNovoPTM and PeaksPTM to provide efficient and accurate solutions. Glycosylation is one of the most frequently observed PTMs in proteomics. It plays important roles in many disease processes and thus has attracted growing research interest. However, lack of algorithms that can identify intact glycopeptides has become the major obstacle that hinders glycoprotein studies. We propose a novel algorithm, GlycoMaster DB, to fulfil this urgent requirement. Additional research is presented on protein quantification, which studies the changes of protein quantity by comparing two or more mass spectral datasets. A crucial problem in the quantification is to correct the retention time distortions between different datasets. Heuristic solutions from previous research have been used in practice but none of them has yet claimed a clear optimization goal. To address this issue, we propose a combinatorial model and practical algorithms for this problem

    Algorithms for integrated analysis of glycomics and glycoproteomics by LC-MS/MS

    Get PDF
    The glycoproteome is an intricate and diverse component of a cell, and it plays a key role in the definition of the interface between that cell and the rest of its world. Methods for studying the glycoproteome have been developed for released glycan glycomics and site-localized bottom-up glycoproteomics using liquid chromatography-coupled mass spectrometry and tandem mass spectrometry (LC-MS/MS), which is itself a complex problem. Algorithms for interpreting these data are necessary to be able to extract biologically meaningful information in a high throughput, automated context. Several existing solutions have been proposed but may be found lacking for larger glycopeptides, for complex samples, different experimental conditions, different instrument vendors, or even because they simply ignore fundamentals of glycobiology. I present a series of open algorithms that approach the problem from an instrument vendor neutral, cross-platform fashion to address these challenges, and integrate key concepts from the underlying biochemical context into the interpretation process. In this work, I created a suite of deisotoping and charge state deconvolution algorithms for processing raw mass spectra at an LC scale from a variety of instrument types. These tools performed better than previously published algorithms by enforcing the underlying chemical model more strictly, while maintaining a higher degree of signal fidelity. From this summarized, vendor-normalized data, I composed a set of algorithms for interpreting glycan profiling experiments that can be used to quantify glycan expression. From this I constructed a graphical method to model the active biosynthetic pathways of the sample glycome and dig deeper into those signals than would be possible from the raw data alone. Lastly, I created a glycopeptide database search engine from these components which is capable of identifying the widest array of glycosylation types available, and demonstrate a learning algorithm which can be used to tune the model to better understand the process of glycopeptide fragmentation under specific experimental conditions to outperform a simpler model by between 10% and 15%. This approach can be further augmented with sample-wide or site-specific glycome models to increase depth-of-coverage for glycoforms consistent with prior beliefs

    Tissue Proteomes: Quantitative Mass Spectrometry of Murine Liver and Ovarian Endometrioma

    Get PDF
    A human genome contains more than 20 000 protein-encoding genes. A human proteome, instead, has been estimated to be much more complex and dynamic. The most powerful tool to study proteins today is mass spectrometry (MS). MS based proteomics is based on the measurement of the masses of charged peptide ions in a gas-phase. The peptide amino acid sequence can be deduced, and matching proteins can be found, using software to correlate MS-data with sequence database information. Quantitative proteomics allow the estimation of the absolute or relative abundance of a certain protein in a sample. The label-free quantification methods use the intrinsic MS-peptide signals in the calculation of the quantitative values enabling the comparison of peptide signals from numerous patient samples. In this work, a quantitative MS methodology was established to study aromatase overexpressing (AROM+) male mouse liver and ovarian endometriosis tissue samples. The workflow of label-free quantitative proteomics was optimized in terms of sensitivity and robustness, allowing the quantification of 1500 proteins with a low coefficient of variance in both sample types. Additionally, five statistical methods were evaluated for the use with label-free quantitative proteomics data. The proteome data was integrated with other omics datasets, such as mRNA microarray and metabolite data sets. As a result, an altered lipid metabolism in liver was discovered in male AROM+ mice. The results suggest a reduced beta oxidation of long chain phospholipids in the liver and increased levels of pro-inflammatory fatty acids in the circulation in these mice. Conversely, in the endometriosis tissues, a set of proteins highly specific for ovarian endometrioma were discovered, many of which were under the regulation of the growth factor TGF-β1. This finding supports subsequent biomarker verification in a larger number of endometriosis patient samples.Siirretty Doriast

    True single-cell proteomics using advanced ion mobility mass spectrometry

    Get PDF
    In this thesis, I present the development of a novel mass spectrometry (MS) platform and scan modes in conjunction with a versatile and robust liquid chromatography (LC) platform, which addresses current sensitivity and robustness limitations in MS-based proteomics. I demonstrate how this technology benefits the high-speed and ultra-high sensitivity proteomics studies on a large scale. This culminated in the first of its kind label-free MS-based single-cell proteomics platform and its application to spatial tissue proteomics. I also investigate the vastly underexplored ‘dark matter’ of the proteome, validating novel microproteins that contribute to human cellular function. First, we developed a novel trapped ion mobility spectrometry (TIMS) platform for proteomics applications, which multiplies sequencing speed and sensitivity by ‘parallel accumulation – serial fragmentation’ (PASEF) and applied it to first high-sensitivity and large-scale projects in the biomedical arena. Next, to explore the collisional cross section (CCS) dimension in TIMS, we measured over 1 million peptide CCS values, which enabled us to train a deep learning model for CCS prediction solely based on the linear amino acid sequence. We also translated the principles of TIMS and PASEF to the field of lipidomics, highlighting parallel benefits in terms of throughput and sensitivity. The core of my PhD is the development of a robust ultra-high sensitivity LC-MS platform for the high-throughput analysis of single-cell proteomes. Improvements in ion transfer efficiency, robust, very low flow LC and a PASEF data independent acquisition scan mode together increased measurement sensitivity by up to 100-fold. We quantified single-cell proteomes to a depth of up to 1,400 proteins per cell. A fundamental result from the comparisons to single-cell RNA sequencing data revealed that single cells have a stable core proteome, whereas the transcriptome is dominated by Poisson noise, emphasizing the need for both complementary technologies. Building on our achievements with the single-cell proteomics technology, we elucidated the image-guided spatial and cell-type resolved proteome in whole organs and tissues from minute sample amounts. We combined clearing of rodent and human organs, unbiased 3D-imaging, target tissue identification, isolation and MS-based unbiased proteomics to describe early-stage β-amyloid plaque proteome profiles in a disease model of familial Alzheimer’s. Automated artificial intelligence driven isolation and pooling of single cells of the same phenotype allowed us to analyze the cell-type resolved proteome of cancer tissues, revealing a remarkable spatial difference in the proteome. Last, we systematically elucidated pervasive translation of noncanonical human open reading frames combining state-of-the art ribosome profiling, CRISPR screens, imaging and MS-based proteomics. We performed unbiased analysis of small novel proteins and prove their physical existence by LC-MS as HLA peptides, essential interaction partners of protein complexes and cellular function

    Computer aided manual validation of mass spectrometry-based proteomic data

    Get PDF
    Advances in mass spectrometry-based proteomic technologies have increased the speed of analysis and the depth provided by a single analysis. Computational tools to evaluate the accuracy of peptide identifications from these high-throughput analyses have not kept pace with technological advances; currently the most common quality evaluation methods are based on statistical analysis of the likelihood of false positive identifications in large-scale data sets. While helpful, these calculations do not consider the accuracy of each identification, thus creating a precarious situation for biologists relying on the data to inform experimental design. Manual validation is the gold standard approach to confirm accuracy of database identifications, but is extremely time-intensive. To palliate the increasing time required to manually validate large proteomic datasets, we provide computer aided manual validation software (CAMV) to expedite the process. Relevant spectra are collected, catalogued, and pre-labeled, allowing users to efficiently judge the quality of each identification and summarize applicable quantitative information. CAMV significantly reduces the burden associated with manual validation and will hopefully encourage broader adoption of manual validation in mass spectrometry-based proteomics.National Institutes of Health (U.S.) (Grant R24DK090963)National Institutes of Health (U.S.) (Grant U54CA112967)National Cancer Institute (U.S.). Integrative Cancer Biology Program (Fellowship)Charles S. Krakauer FellowshipHugh Hampton Young Fellowshi

    Biological and translational cancer proteomics

    Get PDF

    Mass spectrometry-based methods for identifying oxidized proteins in disease:advances and challenges

    Get PDF
    Many inflammatory diseases have an oxidative aetiology, which leads to oxidative damage to biomolecules, including proteins. It is now increasingly recognized that oxidative post-translational modifications (oxPTMs) of proteins affect cell signalling and behaviour, and can contribute to pathology. Moreover, oxidized proteins have potential as biomarkers for inflammatory diseases. Although many assays for generic protein oxidation and breakdown products of protein oxidation are available, only advanced tandem mass spectrometry approaches have the power to localize specific oxPTMs in identified proteins. While much work has been carried out using untargeted or discovery mass spectrometry approaches, identification of oxPTMs in disease has benefitted from the development of sophisticated targeted or semi-targeted scanning routines, combined with chemical labeling and enrichment approaches. Nevertheless, many potential pitfalls exist which can result in incorrect identifications. This review explains the limitations, advantages and challenges of all of these approaches to detecting oxidatively modified proteins, and provides an update on recent literature in which they have been used to detect and quantify protein oxidation in disease

    Spectral counting assessment of protein dynamic range in cerebrospinal fluid following depletion with plasma-designed immunoaffinity columns

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In cerebrospinal fluid (CSF), which is a rich source of biomarkers for neurological diseases, identification of biomarkers requires methods that allow reproducible detection of low abundance proteins. It is therefore crucial to decrease dynamic range and improve assessment of protein abundance.</p> <p>Results</p> <p>We applied LC-MS/MS to compare the performance of two CSF enrichment techniques that immunodeplete either albumin alone (IgYHSA) or 14 high-abundance proteins (IgY14). In order to estimate dynamic range of proteins identified, we measured protein abundance with APEX spectral counting method.</p> <p>Both immunodepletion methods improved the number of low-abundance proteins detected (3-fold for IgYHSA, 4-fold for IgY14). The 10 most abundant proteins following immunodepletion accounted for 41% (IgY14) and 46% (IgYHSA) of CSF protein content, whereas they accounted for 64% in non-depleted samples, thus demonstrating significant enrichment of low-abundance proteins. Defined proteomics experiment metrics showed overall good reproducibility of the two immunodepletion methods and MS analysis. Moreover, offline peptide fractionation in IgYHSA sample allowed a 4-fold increase of proteins identified (520 vs. 131 without fractionation), without hindering reproducibility.</p> <p>Conclusions</p> <p>The novelty of this study was to show the advantages and drawbacks of these methods side-to-side. Taking into account the improved detection and potential loss of non-target proteins following extensive immunodepletion, it is concluded that both depletion methods combined with spectral counting may be of interest before further fractionation, when searching for CSF biomarkers. According to the reliable identification and quantitation obtained with APEX algorithm, it may be considered as a cheap and quick alternative to study sample proteomic content.</p
    corecore