3,299 research outputs found

    Current challenges in software solutions for mass spectrometry-based quantitative proteomics

    Get PDF
    This work was in part supported by the PRIME-XS project, grant agreement number 262067, funded by the European Union seventh Framework Programme; The Netherlands Proteomics Centre, embedded in The Netherlands Genomics Initiative; The Netherlands Bioinformatics Centre; and the Centre for Biomedical Genetics (to S.C., B.B. and A.J.R.H); by NIH grants NCRR RR001614 and RR019934 (to the UCSF Mass Spectrometry Facility, director: A.L. Burlingame, P.B.); and by grants from the MRC, CR-UK, BBSRC and Barts and the London Charity (to P.C.

    "TOF2H": A precision toolbox for rapid, high density/high coverage hydrogen-deuterium exchange mass spectrometry via an LC-MALDI approach, covering the data pipeline from spectral acquisition to HDX rate analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-amide proton hydrogen-deuterium exchange (HDX) is used to investigate protein conformation, conformational changes and surface binding sites for other molecules. To our knowledge, software tools to automate data processing and analysis from sample fractionating (LC-MALDI) mass-spectrometry-based HDX workflows are not publicly available.</p> <p>Results</p> <p>An integrated data pipeline (Solvent Explorer/TOF2H) has been developed for the processing of LC-MALDI-derived HDX data. Based on an experiment-wide template, and taking an <it>ab initio </it>approach to chromatographic and spectral peak finding, initial data processing is based on accurate mass-matching to fully deisotoped peaklists accommodating, in MS/MS-confirmed peptide library searches, ambiguous mass-hits to non-target proteins. Isotope-shift re-interrogation of library search results allows quick assessment of the extent of deuteration from peaklist data alone. During raw spectrum editing, each spectral segment is validated in real time, consistent with the manageable spectral numbers resulting from LC-MALDI experiments. A semi-automated spectral-segment editor includes a semi-automated or automated assessment of the quality of all spectral segments as they are pooled across an XIC peak for summing, centroid mass determination, building of rates plots on-the-fly, and automated back exchange correction. The resulting deuterium uptake rates plots from various experiments can be averaged, subtracted, re-scaled, error-barred, and/or scatter-plotted from individual spectral segment centroids, compared to solvent exposure and hydrogen bonding predictions and receive a color suggestion for 3D visualization. This software lends itself to a "divorced" HDX approach in which MS/MS-confirmed peptide libraries are built via nano or standard ESI without source modification, and HDX is performed via LC-MALDI using a standard MALDI-TOF. The complete TOF2H package includes additional (eg LC analysis) modules.</p> <p>Conclusion</p> <p>"TOF2H" provides a comprehensive HDX data analysis package that has accelerated the processing of LC-MALDI-based HDX data in the authors' lab from weeks to hours. It runs in a standard MS Windows (XP or Vista) environment, and can be downloaded <url>http://tof2h.bio.uci.edu</url> or obtained from the authors at no cost.</p

    Semi-supervised LC/MS alignment for differential proteomics

    Get PDF
    Motivation: Mass spectrometry (MS) combined with high-performance liquid chromatography (LC) has received considerable attention for high-throughput analysis of proteomes. Isotopic labeling techniques such as ICAT [5,6] have been successfully applied to derive differential quantitative information for two protein samples, however at the price of significantly increased complexity of the experimental setup. To overcome these limitations, we consider a label-free setting where correspondences between elements of two samples have to be established prior to the comparative analysis. The alignment between samples is achieved by nonlinear robust ridge regression. The correspondence estimates are guided in a semi-supervised fashion by prior information which is derived from sequenced tandem mass spectra. Results: The semi-supervised method for finding correspondences was successfully applied to aligning highly complex protein samples, even if they exhibit large variations due to different biological conditions. A large-scale experiment clearly demonstrates that the proposed method bridges the gap between statistical data analysis and label-free quantitative differential proteomics. Availability: The software will be available on the website Contact: [email protected]

    Accurate peak list extraction from proteomic mass spectra for identification and profiling studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mass spectrometry is an essential technique in proteomics both to identify the proteins of a biological sample and to compare proteomic profiles of different samples. In both cases, the main phase of the data analysis is the procedure to extract the significant features from a mass spectrum. Its final output is the so-called peak list which contains the mass, the charge and the intensity of every detected biomolecule. The main steps of the peak list extraction procedure are usually preprocessing, peak detection, peak selection, charge determination and monoisotoping operation.</p> <p>Results</p> <p>This paper describes an original algorithm for peak list extraction from low and high resolution mass spectra. It has been developed principally to improve the precision of peak extraction in comparison to other reference algorithms. It contains many innovative features among which a sophisticated method for managing the overlapping isotopic distributions.</p> <p>Conclusions</p> <p>The performances of the basic version of the algorithm and of its optional functionalities have been evaluated in this paper on both SELDI-TOF, MALDI-TOF and ESI-FTICR ECD mass spectra. Executable files of MassSpec, a MATLAB implementation of the peak list extraction procedure for Windows and Linux systems, can be downloaded free of charge for nonprofit institutions from the following web site: <url>http://aimed11.unipv.it/MassSpec</url></p

    Algorithms for integrated analysis of glycomics and glycoproteomics by LC-MS/MS

    Get PDF
    The glycoproteome is an intricate and diverse component of a cell, and it plays a key role in the definition of the interface between that cell and the rest of its world. Methods for studying the glycoproteome have been developed for released glycan glycomics and site-localized bottom-up glycoproteomics using liquid chromatography-coupled mass spectrometry and tandem mass spectrometry (LC-MS/MS), which is itself a complex problem. Algorithms for interpreting these data are necessary to be able to extract biologically meaningful information in a high throughput, automated context. Several existing solutions have been proposed but may be found lacking for larger glycopeptides, for complex samples, different experimental conditions, different instrument vendors, or even because they simply ignore fundamentals of glycobiology. I present a series of open algorithms that approach the problem from an instrument vendor neutral, cross-platform fashion to address these challenges, and integrate key concepts from the underlying biochemical context into the interpretation process. In this work, I created a suite of deisotoping and charge state deconvolution algorithms for processing raw mass spectra at an LC scale from a variety of instrument types. These tools performed better than previously published algorithms by enforcing the underlying chemical model more strictly, while maintaining a higher degree of signal fidelity. From this summarized, vendor-normalized data, I composed a set of algorithms for interpreting glycan profiling experiments that can be used to quantify glycan expression. From this I constructed a graphical method to model the active biosynthetic pathways of the sample glycome and dig deeper into those signals than would be possible from the raw data alone. Lastly, I created a glycopeptide database search engine from these components which is capable of identifying the widest array of glycosylation types available, and demonstrate a learning algorithm which can be used to tune the model to better understand the process of glycopeptide fragmentation under specific experimental conditions to outperform a simpler model by between 10% and 15%. This approach can be further augmented with sample-wide or site-specific glycome models to increase depth-of-coverage for glycoforms consistent with prior beliefs

    Laboratory methods to improve SELDI peak detection and quantitation

    Get PDF
    Abstract Background Protein profiling with surface-enhanced laser desorption-ionisation time-of-flight mass spectrometry (SELDI-TOF MS) is a promising approach for biomarker discovery. Some candidate biomarkers have been identified using SELDI-TOF, but validation of these can be challenging because of technical parameters that effect reproducibility. Here we describe steps to improve the reproducibility of peak detection. Methods SELDI-TOF mass spectrometry was performed using a system manufactured by Ciphergen Biosystems along with their ProteinChip System. Serum from 10 donors was pooled and used for all experiments. Serum was fractionated with Expression Difference Mapping kit-Serum Fractionation from the same company and applied to three different ProteinChips. The fractionations were run over a one month period to examine the contribution of sample batch and time to peak detection variability. Spectra were processed and peaks detected using the Ciphergen Express software and variance measured. Results Experimental parameters specific to the serum fraction and ProteinChip, including spot protocols (laser intensity and detector sensitivity) were optimized to decrease peak detection variance. Optimal instrument settings, regular calibration along with controlled sample handling and processing nearly doubled the number of peaks detected and decreased intensity variance. Conclusion This report assesses the variation across fractionated sera processed over a one-month period. The optimizations reported decreased the variance and increased the number of peaks detected.</p

    Automated Analysis of Biomedical Data from Low to High Resolution

    Get PDF
    Recent developments of experimental techniques and instrumentation allow life scientists to acquire enormous volumes of data at unprecedented resolution. While this new data brings much deeper insight into cellular processes, it renders manual analysis infeasible and calls for the development of new, automated analysis procedures. This thesis describes how methods of pattern recognition can be used to automate three popular data analysis protocols: Chapter 1 proposes a method to automatically locate bimodal isotope distribution patterns in Hydrogen Deuterium Exchange Mass Spectrometry experiments. The method is based on L1-regularized linear regression and allows for easy quantitative analysis of co-populations with different exchange behavior. The sensitivity of the method is tested on a set of manually identified peptides, while its applicability to exploratory data analysis is validated by targeted follow-up peptide identification. Chapter 2 develops a technique to automate peptide quantification for mass spectrometry experiments, based on 16O/18O labeling of peptides. Two different spectrum segmentation algorithms are proposed: one based on image processing and applicable to low resolution data and one exploiting the sparsity of high resolution data. The quantification accuracy is validated on calibration datasets, produced by mixing a set of proteins in pre-defined ratios. Chapter 3 provides a method for automated detection and segmentation of synapses in electron microscopy images of neural tissue. For images acquired by scanning electron microscopy with nearly isotropic resolution, the algorithm is based on geometric features computed in 3D pixel neighborhoods. For transmission electron microscopy images with poor z-resolution, the algorithm uses additional regularization by performing several rounds of pixel classification with features computed on the probability maps of the previous classification round. The validation is performed by comparing the set of synapses detected by the algorithm against a gold standard detection by human experts. For data with nearly isotropic resolution, the algorithm performance is comparable to that of the human experts

    Computational Analysis of Mass Spectrometric Data for Whole Organism Proteomic Studies

    Get PDF
    In the last decades great breakthroughs have been achieved in the study of the genomes, supplying us with the vast knowledge of the genes and a large number of sequenced organisms. With the availability of genome information, the new systematic studies have arisen. One of the most prominent areas is proteomics. Proteomics is a discipline devoted to the study of the organism’s expressed protein content. Proteomics studies are concerned with a wide range of problems. Some of the major proteomics focuses upon the studies of protein expression patterns, the detection of protein-protein interactions, protein quantitation, protein localization analysis, and characterization of post-translational modifications. The emergence of proteomics shows great promise to furthering our understanding of the cellular processes and mechanisms of life. One of the main techniques used for high-throughput proteomic studies is mass spectrometry. Capable of detecting masses of biological compounds in complex mixtures, it is currently one of the most powerful methods for protein characterization. New horizons are opening with the new developments of mass spectrometry instrumentation, which can now be applied to a variety of proteomic problems. One of the most popular applications of proteomics involves whole organism high-throughput experiments. However, as new instrumentation is being developed, followed by the design of new experiments, we find ourselves needing new computational algorithms to interpret the results of the experiments. As the thresholds of the current technology are being probed, the new algorithmic designs are beginning to emerge to meet the challenges of the mass spectrometry data evaluation and interpretation. This dissertation is devoted to computational analysis of mass spectrometric data, involving a combination of different topics and techniques to improve our understanding of biological processes using high-throughput whole organism proteomic studies. It consists of the development of new algorithms to improve the data interpretation of the current tools, introducing a new algorithmic approach for post-translational modification detection, and the characterization of a set of computational simulations for biological agent detection in a complex organism background. These studies are designed to further the capabilities of understanding the results of high-throughput mass spectrometric experiments and their impact in the field of proteomics
    • …
    corecore