53 research outputs found

    Algorithms for integrated analysis of glycomics and glycoproteomics by LC-MS/MS

    Get PDF
    The glycoproteome is an intricate and diverse component of a cell, and it plays a key role in the definition of the interface between that cell and the rest of its world. Methods for studying the glycoproteome have been developed for released glycan glycomics and site-localized bottom-up glycoproteomics using liquid chromatography-coupled mass spectrometry and tandem mass spectrometry (LC-MS/MS), which is itself a complex problem. Algorithms for interpreting these data are necessary to be able to extract biologically meaningful information in a high throughput, automated context. Several existing solutions have been proposed but may be found lacking for larger glycopeptides, for complex samples, different experimental conditions, different instrument vendors, or even because they simply ignore fundamentals of glycobiology. I present a series of open algorithms that approach the problem from an instrument vendor neutral, cross-platform fashion to address these challenges, and integrate key concepts from the underlying biochemical context into the interpretation process. In this work, I created a suite of deisotoping and charge state deconvolution algorithms for processing raw mass spectra at an LC scale from a variety of instrument types. These tools performed better than previously published algorithms by enforcing the underlying chemical model more strictly, while maintaining a higher degree of signal fidelity. From this summarized, vendor-normalized data, I composed a set of algorithms for interpreting glycan profiling experiments that can be used to quantify glycan expression. From this I constructed a graphical method to model the active biosynthetic pathways of the sample glycome and dig deeper into those signals than would be possible from the raw data alone. Lastly, I created a glycopeptide database search engine from these components which is capable of identifying the widest array of glycosylation types available, and demonstrate a learning algorithm which can be used to tune the model to better understand the process of glycopeptide fragmentation under specific experimental conditions to outperform a simpler model by between 10% and 15%. This approach can be further augmented with sample-wide or site-specific glycome models to increase depth-of-coverage for glycoforms consistent with prior beliefs

    Multiple marker abundance profiling:combining selected reaction monitoring and data-dependent acquisition for rapid estimation of organelle abundance in subcellular samples

    Get PDF
    Measuring changes in protein or organelle abundance in the cell is an essential, but challenging aspect of cell biology. Frequently-used methods for determining organelle abundance typically rely on detection of a very few marker proteins, so are unsatisfactory. In silico estimates of protein abundances from publicly available protein spectra can provide useful standard abundance values but contain only data from tissue proteomes, and are not coupled to organelle localization data. A new protein abundance score, the normalized protein abundance scale (NPAS), expands on the number of scored proteins and the scoring accuracy of lower-abundance proteins in Arabidopsis. NPAS was combined with subcellular protein localization data, facilitating quantitative estimations of organelle abundance during routine experimental procedures. A suite of targeted proteomics markers for subcellular compartment markers was developed, enabling independent verification of in silico estimates for relative organelle abundance. Estimation of relative organelle abundance was found to be reproducible and consistent over a range of tissues and growth conditions. In silico abundance estimations and localization data have been combined into an online tool, multiple marker abundance profiling, available in the SUBA4 toolbox (http://suba.live)

    The use of recently developed mass spectrometry-based proteomic approaches for the study of methylocella silvestris BL2

    Get PDF
    The study of the protein complement, termed proteomics, has advanced over the last twenty years as a consequence of developments in mass spectrometry. Currently, improvements in mass spectrometry-based approaches are targeted towards achieving information on both the identity and abundance of proteins. Increased numbers of protein identifications are obtained by simplifying the analyte of interest. This can be achieved with the use of separation techniques, including two-dimensional liquid chromatography (2D-LC). Ion mobility coupled to mass spectrometry has recently been shown to be a useful post-ionisation separation tool for proteomic studies. The utility of these technologies for obtaining both qualitative and quantitative information is not extensively addressed in the current literature. The use of a recently developed 2D-LC system, together with a method of ion mobility separation and a label-free quantitative approach for proteomic studies has been evaluated here for characterising the proteome of the bacterium Methylocella silvestris. This bacterium is the first methane-utilising bacteria also discovered to grow on substrates containing carbon-carbon bonds, and has great biotechnological potential. The metabolism of this bacterium was studied by obtaining information on its soluble proteome when grown with methane, propane, succinate, acetate, methanol, methylamine or trimethylamine. The benefits and limitations of 2D-LC and ion mobility for profiling and labelfree quantitative studies were demonstrated for simple mixtures and complex bacterial extracts. The combination of both 2D-LC and ion mobility was also achieved, resulting in wider proteome coverage when compared to the respective stand-alone approaches. A cluster of expressed genes that were greatly up-regulated under trimethylamine growth and monomethylamine growth were proposed to be involved in the indirect pathway for trimethylamine metabolism. It was further verified that one of these genes expresses the previously unidentified trimethylamine monooxygenase. A propane assimilation route was proposed, based on information obtained on the levels of primary oxidation enzymes and downstream central metabolic pathways

    Phosphoproteomics and proteomic phenotyping to assess signal transduction in cancer cells

    Get PDF
    This thesis applies quantitative mass spectrometry to research topics in relation to cancer. Proteome-wide quantification at the protein expression level and phosphorylation level were achieved. The technologies developed and used here cover the latest improvements in instrumentation in mass spectrometry, strategies in phosphopeptide enrichment in large scale, algorithms in data analysis and their streamlined implementation, and data mining in downstream bioinformatics. For each of the projects described in this thesis, proteome mapping routinely resulted in identification and quantitation of around 4,000 proteins and phosphoproteome mapping often lead to quantitation of more than 5,000 phosphorylation sites. This ‘systems-wide’ quantitation of the proteome and phosphoproteome is a completely novel development, which has not been used in cancer related topics before. Three major biology topics are studied in this thesis. In the first project, the phosphoproteome of a mouse liver cancer cell line Hepa1-6 was analyzed in-depth, by using phosphatase inhibitors (calyculin A, deltamethrin, and Na-pervanadate) to boost phosphorylation. The characterization of the phosphoproteome revealed a broad spectrum of cellular compartmentalization and biological functions. Quantitation of phosphatase inhibitor treatment using the Stable Isotope Labeling by Amino Acids in Cell culture (SILAC) method revealed the quantitative effects of these inhibitor compounds on the whole phosphoproteome. To our surprise, these three broadband phosphatase inhibitors displayed very different efficiency, with tyrosine phosphorylation significantly boosted but serine/threonine phosphorylation much less affected. Additionally, a method to estimate an upper bound of the stoichiometry of phosphorylation was introduced by comparing phosphorylation in three SILAC conditions: non-treated cells, stimulated cells (e.g. with insulin), and only phosphatase inhibitor treated cells. The methods developed here can be used directly in development of drugs directed against kinases and phosphatases, key regulators in cancer and other diseases. The second project continues with the application of phosphoproteomics techniques. Kinase inhibitors influence cellular signal transduction processes and therefore are of great potential in rescuing aberrant cellular signaling in tumors. In fact they constitute a significant portion of drug developing programs in pharmaceutical industry. With the aim of quantifying the effect of kinase inhibitors over the entire signaling network, the second project first set out to study two very commonly used kinase inhibitor compounds for MAPKs: U0126 and SB202190. Their effect on epidermal growth factor (EGF) signal transduction was quantified and compared using the HeLa cell system. The study confirmed that the MAPK cascades are the predominant signaling branches for propagating the EGF signaling at early time points of stimulation. These large scale examinations also suggest that U0126 and SB202190 are quite specific inhibitors for MAPKs as the majority of regulated phosphopeptides appears to belong to the MAPK pathways. In the second part of the project, the effect on phosphoproteome changes of the chemical compound dasatinib, which was demonstrated to effectively inhibit the constitutively activated fusion protein BCR-ABL and was recently approved for chronic myelogenous leukemia (CML) therapy, was quantified in the human CML cell line K562. Bioinformatic analysis revealed that the most influenced signal transduction branch was the Erk1/2 cascade. Overall more than 500 phosphorylation sites were found to be regulated by dasatinib, the vast majority not described in the literature yet. The third project compared the proteomes of mouse hepatoma cell line Hepa1-6 with the non-transformed mouse primary hepatocytes. This was performed by combining the SILAC heavy labeled form of Hepa1-6 with the primary hepatocytes. To characterize the features of these two proteomes, quantitation information (i.e. protein ratios between the two cell types) was used to divide all proteins into five quantiles. Each quantile was clustered according to the Gene Ontology and KEGG pathway databases to assess their enriched functional groups and signaling pathways. To integrate this information at a higher level, hierarchical clustering based on the p-value from the first Gene Ontology and KEGG clustering was performed. Using this improved bioinformatic algorithm for data mining, the proteomic phenotypes of the primary cells and transformed cells are immediately apparent. Primary hepatocytes are enriched in mitochondrial functions such as metabolic regulation and detoxification, as well as liver functions with tissue context such as secretion of plasma and low-density lipoprotein (LDL). In contrast, the transformed cancer cell line Hepa1-6 is enriched in cell cycle and growth functions. Interestingly, several aspects of the molecular basis of the “Warburg effect” described in many cancer cells became apparent in Hepa1-6, such as increased expression of glycolysis markers and decreased expression of markers for tricarboxylic acid (TCA) cycle. Studies in this thesis only provide examples of the application of mass spectrometry-based quantitative proteomics and phosphoproteomics in cancer research. The connection to clinical research, especially the assessment of drug effects on a proteome wide scale, is a specific feature of this thesis. Although this development is only in its infancy, it reflects a trend in the quantitative mass spectrometry field. We believe that more and more clinical related topics can and will be studied by these powerful methods

    Peak annotation and data analysis software tools for mass spectrometry imaging

    Get PDF
    La metabolòmica espacial és la disciplina que estudia les imatges de les distribucions de compostos químics de baix pes (metabòlits) a la superfície dels teixits biològics per revelar interaccions entre molècules. La imatge d'espectrometria de masses (MSI) és actualment la tècnica principal per obtenir informació d'imatges moleculars per a la metabolòmica espacial. MSI és una tecnologia d'imatges moleculars sense marcador que produeix espectres de masses que conserven les estructures espacials de les mostres de teixit. Això s'aconsegueix ionitzant petites porcions d'una mostra (un píxel) en un ràster definit a través de tota la seva superfície, cosa que dona com a resultat una col·lecció d'imatges de distribució de ions (registrades com a relacions massa-càrrega (m/z)) sobre la mostra. Aquesta tesi té com a objectius desenvolupar eines computacionals per a l'anotació de pics de MSI i el disseny de fluxos de treball per a l'anàlisi estadística i multivariant de dades MSI, inclosa la segmentació espacial. El treball realitzat en aquesta tesi es pot separar clarament en dues parts. En primer lloc, el desenvolupament d'una eina d'anotació de pics d'isòtops i adductes adequada per facilitar la identificació de compostos de rang de massa baix. Ara podem trobar fàcilment ions monoisotòpics als nostres conjunts de dades MSI gràcies al paquet de programari rMSIannotation. En segon lloc, el desenvolupament de eines de programari per a l’anàlisi de dades i la segmentació espacial basades en soft clustering per a dades MSI.La metabolómica espacial es la disciplina que estudia las imágenes de las distribuciones de compuestos químicos de bajo peso (metabolitos) en la superficie de los tejidos biológicos para revelar interacciones entre moléculas. Las imágenes de espectrometría de masas (MSI) es actualmente la principal técnica para obtener información de imágenes moleculares para la metabolómica espacial. MSI es una tecnología de imágenes moleculares sin marcador que produce espectros de masas que conservan las estructuras espaciales de las muestras de tejido. Esto se logra ionizando pequeñas porciones de una muestra (un píxel) en un ráster definido a través de toda su superficie, lo que da como resultado una colección de imágenes de distribución de iones (registradas como relaciones masa-carga (m/z)) sobre la muestra. Esta tesis tiene como objetivo desarrollar herramientas computacionales para la anotación de picos en MSI y en el diseño de flujos de trabajo para el análisis estadístico y multivariado de datos MSI, incluida la segmentación espacial. El trabajo realizado en esta tesis se puede separar claramente en dos partes. En primer lugar, el desarrollo de una herramienta de anotación de picos de isótopos y aductos adecuada para facilitar la identificación de compuestos de bajo rango de masa. Ahora podemos encontrar fácilmente iones monoisotópicos en nuestros conjuntos de datos MSI gracias al paquete de software rMSIannotation.Spatial metabolomics is the discipline that studies the images of the distributions of low weight chemical compounds (metabolites) on the surface of biological tissues to unveil interactions between molecules. Mass spectrometry imaging (MSI) is currently the principal technique to get molecular imaging information for spatial metabolomics. MSI is a labelfree molecular imaging technology that produces mass spectra preserving the spatial structures of tissue samples. This is achieved by ionizing small portions of a sample (a pixel) in a defined raster through all its surface, which results in a collection of ion distribution images (registered as mass-to-charge ratios (m/z)) over the sample. This thesis is aimed to develop computational tools for peak annotation in MSI and in the design of workflows for the statistical and multivariate analysis of MSI data, including spatial segmentation. The work carried out in this thesis can be clearly separated in two parts. Firstly, the development of an isotope and adduct peak annotation tool suited to facilitate the identification of the low mass range compounds. We can now easily find monoisotopic ions in our MSI datasets thanks to the rMSIannotation software package. Secondly, the development of software tools for data analysis and spatial segmentation based on soft clustering for MSI data. In this thesis, we have developed tools and methodologies to search for significant ions (rMSIKeyIon software package) and for the soft clustering of tissues (Fuzzy c-means algorithm)

    Mining Deeper into the Proteome: Computational Strategies for Improving Depth and Breadth of Coverage in High-Throughput Protein Identification Studies.

    Full text link
    The proteomics field is driven by the need to develop increasingly high-throughput methods for the identification and characterization of proteins. The overall goal of this research is to improve the success rate of modern high-throughput proteomics studies. The focus is on developing computational strategies for increasing the number of identifications as well as improving the ability to distinguish new forms of proteins and peptides. Several studies are presented, addressing different points in the proteomics analysis pipeline. At the most fundamental data analysis level, methods for using modern machine learning algorithms to improve the ability to distinguish correct from incorrect peptide identifications are presented. These techniques have the potential to minimize the need for manual curation of results, providing a significant increase in throughput in addition to increased identification confidence. Non-standard types of mass spectrometry data are being generated in specific contexts. Specifically, phosphoproteomics often involves the generation of MS3 spectra. These spectra alleviate problems associated with MS2 fragmentation of phosphopeptides, but utilizing the additional information contained in these spectra requires novel informatics. Several strategies for accommodating this additional information are presented. A statistical model is developed for translating the information contained in the coupling of consecutive MS2 and MS3 spectra into a more accurate peptide identification probability score. Also, methods for combining MS2 and MS3 data are explored. A newer mass spectrometry methodology useful for phosphoproteomics has recently been introduced as well, termed multistage activation (MSA). A comparative study of this and other methods is presented aimed at determining an optimal method for generating phosphopeptide identifications, focusing not only on data analysis techniques, but also on the mass spectrometry methodologies themselves. A dataset is presented from a differential study of a human cell line infected with the dengue virus. The study explores the complementarity of different fractionation methods in generating more unique protein identifications. A discussion of a statistical mixture model that utilizes relative quantification information to classify identified peptides into two categories based on their membrane topology is given in the final chapter. Finally, a comment on utilizing pI information to enrich for phosphopeptides is provided.Ph.D.BioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/58496/1/pulintz_1.pd

    The Origin and Early Evolution of Life

    Get PDF
    What is life? How, where, and when did life arise? These questions have remained most fascinating over the last hundred years. Systems chemistry is the way to go to better understand this problem and to try and answer the unsolved question regarding the origin of Life. Self-organization, thanks to the role of lipid boundaries, made possible the rise of protocells. The role of these boundaries is to separate and co-locate micro-environments, and make them spatially distinct; to protect and keep them at defined concentrations; and to enable a multitude of often competing and interfering biochemical reactions to occur simultaneously. The aim of this Special Issue is to summarize the latest discoveries in the field of the prebiotic chemistry of biomolecules, self-organization, protocells and the origin of life. In recent years, thousands of excellent reviews and articles have appeared in the literature and some breakthroughs have already been achieved. However, a great deal of work remains to be carried out. Beyond the borders of the traditional domains of scientific activity, the multidisciplinary character of the present Special Issue leaves space for anyone to creatively contribute to any aspect of these and related relevant topics. We hope that the presented works will be stimulating for a new generation of scientists that are taking their first steps in this fascinating field

    Bioinformatic and Experimental Approaches for Deeper Metaproteomic Characterization of Complex Environmental Samples

    Get PDF
    The coupling of high performance multi-dimensional liquid chromatography and tandem mass spectrometry for characterization of microbial proteins from complex environmental samples has paved the way for a new era in scientific discovery. The field of metaproteomics, which is the study of protein suite of all the organisms in a biological system, has taken a tremendous leap with the introduction of high-throughput proteomics. However, with corresponding increase in sample complexity, novel challenges have been raised with respect to efficient peptide separation via chromatography and bioinformatic analysis of the resulting high throughput data. In this dissertation, various aspects of metaproteomic characterization, including experimental and computational approaches have been systematically evaluated. In this study, robust separation protocols employing strong cation exchange and reverse phase have been designed for efficient peptide separation thus offering excellent orthogonality and ease of automation. These findings will be useful to the proteomics community for obtaining deeper non-redundant peptide identifications which in turn will improve the overall depth of semi-quantitative proteomics. Secondly, computational bottlenecks associated with screening the vast amount of raw mass spectra generated in these proteomic measurements have been addressed. Computational matching of tandem mass spectra via conventional database search strategies lead to modest peptide/protein identifications. This seriously restricts the amount of information retrieved from these complex samples which is mainly due to high complexity and heterogeneity of the sample containing hundreds of proteins shared between different microbial species often having high level of homology. Hence, the challenges associated with metaproteomic data analysis has been addressed by utilizing multiple iterative search engines coupled with de novo sequencing algorithms for a comprehensive and in-depth characterization of complex environmental samples. The work presented here will utilize various sample types ranging from isolates and mock microbial mixtures prepared in the laboratory to complex community samples extracted from industrial waste water, acid-mine drainage and methane seep sediments. In a broad perspective, this dissertation aims to provide tools for gaining deeper insights to proteome characterization in complex environmental ecosystems
    corecore