424 research outputs found

    Algorithms for Glycan Structure Identification with Tandem Mass Spectrometry

    Get PDF
    Glycosylation is a frequently observed post-translational modification (PTM) of proteins. It has been estimated over half of eukaryotic proteins in nature are glycoproteins. Glycoprotein analysis plays a vital role in drug preparation. Thus, characterization of glycans that are linked to proteins has become necessary in glycoproteomics. Mass spectrometry has become an effective analytical technique for glycoproteomics analysis because of its high throughput and sensitivity. The large amount of spectral data collected in a mass spectrometry experiment makes manual interpretation impossible and requires effective computational approaches for automated analysis. Different algorithmic solutions have been proposed to address the challenges in glycoproteomics analysis based on mass spectrometry. However, new algorithms that can identify intact glycopeptides are still demanded to improve result accuracy. In this research, a glycan is represented as a rooted unordered labelled tree and we focus on developing effective algorithms to determine glycan structures from tandem mass spectra. Interpreting the tandem mass spectra of glycopeptides with a de novo sequencing method is essential to identifying novel glycan structures. Thus, we mathematically formulated the glycan de novo sequencing problem and propose a heuristic algorithm for glycan de novo sequencing from HCD tandem mass spectra of glycopeptides. Characterizing glycans from MS/MS with a de novo sequencing method requires high-quality mass spectra for accurate results. The database search method usually has the ability to obtain more reliable results since it has the assistance of glycan structural information. Thus, we propose a de novo sequencing assisted database search method, GlycoNovoDB, for mass spectra interpretation

    Methods in automated glycosaminoglycan tandem mass spectra analysis

    Get PDF
    Glycosylation is the process by which a glycan is enzymatically attached to a protein, and is one of the most common post-translational modifications in nature. One class of glycans is the glycosaminoglycans (GAGs), which are long, linear polysaccharides that are variably sulfated and make up the glycan portion of proteoglycans (PGs). PGs are located on the cellular surface and in the extracellular matrix (ECM), making them important molecules for cell signaling and ligand binding. The GAG sulfation sequence is a determining factor for the signaling capacity of binding complexes, so accurate determination of the sequence is critical. Historically, GAG sequencing using tandem mass spectrometry (MS2) has been a difficult, manual process; however, with the advent of faster computational techniques and higher-resolution MS2, high-throughput GAG sequencing is within reach. Two steps in the pipeline of biomolecule sequencing using MS2 are discovery and interpretation of spectral peaks. The discovery step traditionally is performed using methods that rely on the concept of averagine, or the average molecular building block for the analyte in question. These methods were developed for protein sequencing, but perform considerably worse on GAG sequences, due to the non-uniform distribution of sulfur atoms along the chain and the relatively high isotope abundance of 34S. The interpretation step traditionally is performed manually, which takes time and introduces potential user error. To combat these problems, I developed GAGfinder, the first GAG-specific MS2 peak finding and annotation software. GAGfinder is described in detail in chapter two. Another step in MS2 sequencing is the determination of the sequence using the found MS2 fragments. For a given GAG composition, there are many possible sequences, and peak finding algorithms such as GAGfinder return a list of the peaks in the MS2 mass spectrum. The many-to-many relationship between sequences and fragments can be represented using a bipartite network, and node-ranking techniques can be employed to generate likelihood scores for possible sequences. I developed a bipartite network-based sequencing tool, GAGrank, based on a bipartite network extension of Googleā€™s PageRank algorithm for ranking websites. GAGrank is described in detail in chapter three

    De novo sequencing of heparan sulfate saccharides using high-resolution tandem mass spectrometry

    Get PDF
    Heparan sulfate (HS) is a class of linear, sulfated polysaccharides located on cell surface, secretory granules, and in extracellular matrices found in all animal organ systems. It consists of alternately repeating disaccharide units, expressed in animal species ranging from hydra to higher vertebrates including humans. HS binds and mediates the biological activities of over 300 proteins, including growth factors, enzymes, chemokines, cytokines, adhesion and structural proteins, lipoproteins and amyloid proteins. The binding events largely depend on the fine structure - the arrangement of sulfate groups and other variations - on HS chains. With the activated electron dissociation (ExD) high-resolution tandem mass spectrometry technique, researchers acquire rich structural information about the HS molecule. Using this technique, covalent bonds of the HS oligosaccharide ions are dissociated in the mass spectrometer. However, this information is complex, owing to the large number of product ions, and contains a degree of ambiguity due to the overlapping of product ion masses and lability of sulfate groups; as a result, there is a serious barrier to manual interpretation of the spectra. The interpretation of such data creates a serious bottleneck to the understanding of the biological roles of HS. In order to solve this problem, I designed HS-SEQ - the first HS sequencing algorithm using high-resolution tandem mass spectrometry. HS-SEQ allows rapid and confident sequencing of HS chains from millions of candidate structures and I validated its performance using multiple known pure standards. In many cases, HS oligosaccharides exist as mixtures of sulfation positional isomers. I therefore designed MULTI-HS-SEQ, an extended version of HS-SEQ targeting spectra coming from more than one HS sequence. I also developed several pre-processing and post-processing modules to support the automatic identification of HS structure. These methods and tools demonstrated the capacity for large-scale HS sequencing, which should contribute to clarifying the rich information encoded by HS chains as well as developing tailored HS drugs to target a wide spectrum of diseases

    Fucosylated and Sulfated Glycans Investigated using Cryogenic Infrared Spectroscopy

    Get PDF
    Unusual monosaccharides (fucose), covalent modifications of glycans (sulfation) and terminal sequences play important biological roles in physiology and pathology of living organisms. Furthermore, in an evolutionary sense, uncommon structures are often the result of selection pressures and can be the source to a deeper understanding of the evolution of glycosylation.157 At the same time, fucosylated glycans and sulfated glycans still challenge standard mass spectrometry (MS)-based analytical workflows in glycan analysis. MS emerged throughout the last decade as the most widely used analytical technique in glycan analysis. As a stand-alone technique, it is limited in glycan analysis due to the presence of isomers. Isomerism in glycans arises from their composition, connectivity, configuration, and branching. Therefore, MS is often coupled to orthogonal techniques such as liquid chromatography (LC) and ion mobility spectrometry (IM-MS). Most recently, the combination of cryogenic IR spectroscopy in the gas phase with MS proved beneficial for the identification of smaller glycans. At low measurement temperatures, the IR spectrum of small glycans provides a unique fingerprint to the underlying chemical structure and conformation.In this thesis, cryogenic IR spectroscopy as an addition to the MS-based analytical toolbox was used to shed light on the migration of fucose residues in MS experiments. This elusive rearrangement reaction is not restricted to tandem MS workflows but is recently found to occur in intact ions without extensive activation. Here, the role of the proton in fucose migration reactions was investigated for the two glycan epitopes Lewis x and blood group H type 2. A systematic study of adduct ions and functional groups with competing proton affinities demonstrated that the proton can be selectively mobilized and demobilized. Planning MS-based experiments of fucosylated glycan cations certainly needs an effective strategy to circumvent the presence of a mobile proton in order to avoid erroneous sequence assignments.In a multidimensional approach, IR spectroscopy, IM-MS, RDD and computational modelling were combined to decode the rearrangement product and the reaction mechanism. The trisaccharides Lewis x and blood group H type 2 were found to migrate to a third chemical structure, in which the fucose moiety is most likely 1,6-linked to galactose. The barrier is much higher for blood group H type 2 compared to Lewis x and it is feasible that the latter is never detected in its original chemical structure in the mass spectrometer. These results generalize fucose migration to a universal issue in any mass spectrometer to which even various orthogonal MS-based techniques can be blind.In the second part of this thesis, cryogenic IR spectroscopy in combination with computational modelling was employed for the structural analysis of sulfated glycosaminoglycans (GAGs). Diversity in the chemical structure of linear and acidic GAGs arises from the GAG class, sulfation, epimerization and acetylation. Using messenger tagging IR spectroscopy, sulfated mono- and disaccharides have been characterized successfully recently. In the present thesis, the prominent anticoagulant pentasaccharide fondaparinux which carries eight sulfate functional groups was investigated using cryogenic IR spectroscopy in helium nanodroplets as a proof-of-concept. The spectroscopic fingerprint features unique absorption bands in the mid-IR range for the sulfate functional groups. With this knowledge, a systematic set of all naturally occurring sulfation variations in chondroitin and dermatan sulfate (CS/DS) further demonstrated the capabilities of cryogenic IR spectroscopy for their differentiation. Moreover, from their IR fingerprints in combination with computational modelling, conformational diversity arising from sulfation and charge density distribution could be derived. In a different study, the IR fingerprints of four heparan sulfate (HS) diastereomers revealed a modularity in their chemical structure which was explained, using computational modelling, from their unique hydrogen bonding patterns. The knowledge of the preferred hydrogen bonding pattern could aid e.g. the development for labelling strategies in IM-MS. The results show that the high resolution in the optical fingerprints of GAGs allows to unambiguously resolve their diversity arising from GAG class, sulfation and epimerization. The results exemplify the importance of gas- phase cryogenic IR spectroscopy to enhance future analytical workflows for GAG sequencing. A fully MS-based workflow could involve the ionization of an intact GAG chain and combine tandem MS with IM-MS and cryogenic IR spectroscopy of respective fragments to unambiguously characterize a GAG chain in a single MS experiment.In the last part, cryogenic IR spectroscopy was combined with random forest modelling to extract vibrational features that are characteristic to structural features in GAGs. The selected structural features included the GAG class and sulfation and therefore, almost fully characterize the underlying chemical structure. In a proof-of-concept study, a prediction score of >97% could be achieved for HS tetra- and hexasaccharides based on a training set of only 21 spectra. Especially for certain marker motifs, such as 3-O-sulfation in cancer cells, this workflow could prove beneficial. With machine learning algorithms, the need for comprehensive spectral databases could be circumvented for the identification of unknowns. Overall, the results show that MS-based IR spectroscopy certainly has the potential to leave the framework of academic basic research and add as a valuable addition to the MS-based analytical toolbox.Weinig voorkomende monosachariden (fucose), covalente modificaties van glycanen (sulfering) en terminale sequenties spelen belangrijke rollen in de fysiologie en pathologie van levende organismen. Weinig voorkomende structuren zijn in evolutionaire zin vaak het resultaat van selectiedruk en kunnen derhalve een dieper inzicht leveren in de evolutie van glycosylering. Gefucosyleerde glycanen en gesulfoneerde glycanen vormen echter nog steeds een uitdaging voor standaard workflows in glycaananalyse. Massaspectrometrie (MS) heeft zich in het laatste decennium ontwikkeld tot de meest gebruikte techniek voor glycaananalyse, maar is beperkt door de aanwezigheid van isomeren. Isomeren van glycanen zijn het gevolg van hun samenstelling, connectiviteit, configuratie en vertakking. MS wordt daarom vaak gekoppeld aan complementaire technieken zoals vloeistofchromatografie (LC) en ion- mobiliteitsspectrometrie (IM-MS). Gedurende de laatste jaren is de combinatie van cryogene infrarood (IR)-spectroscopie in de gasfase met MS van grote waarde gebleken voor de identificatie van kleinere glycanen. Bij lage meettemperaturen geeft het IR spectrum van kleine glycanen een unieke vingerafdruk van de onderliggende chemische structuur en conformatie.In dit proefschrift is cryogene IR-spectroscopie in combinatie met MS- gebaseerde analytische technieken gebruikt om licht te werpen op de migratie van fucose in MS-experimenten. Deze ongrijpbare migratiereactie is niet beperkt tot tandem MS workflows, maar is recentelijk ook waargenomen in intacte ionen zonder uitgebreide activering. De rol van het proton in fucose- migratiereacties is onderzocht voor de twee glycaanepitopen Lewis x en bloedgroep H type 2. In een systematische studie van adductie-ionen en functionele groepen met concurrerende protonaffiniteiten is aangetoond dat het proton selectief gemobiliseerd en gedemobiliseerd kan worden. Het meten van gefucosyleerde glycaan-kationen met MS vereist een effectieve strategie om de aanwezigheid van een mobiel proton te omzeilen om foutieve sequentie- toewijzingen te voorkomen.In een multidimensionele benadering zijn IR spectroscopie, IM-MS, radical- directed dissociation (RDD) MS en computationele modellering gecombineerd om het migratieproduct en het reactiemechanisme te ontcijferen. De trisachariden Lewis x en bloedgroep H type 2 blijken te migreren naar een chemische structuur, waarin fucose hoogstwaarschijnlijk 1,6-gekoppeld is aan galactose. De barriƋre is veel hoger voor bloedgroep H type 2 dan voor Lewis x en het is goed mogelijk dat de laatste nooit in zijn oorspronkelijke chemische structuur gedetecteerd is in de massaspectrometer. Uit deze resultaten blijkt dat fucose-migratie een universeel probleem is in elke massaspectrometer en dat ook het gebruik van verschillende complementaire MS-gebaseerde technieken dit probleem niet geheel kan oplossen.In het tweede deel van dit proefschrift is cryogene IR spectroscopie in combinatie met computationele modellering gebruikt voor de structurele analyse van gesulfoneerde glycosaminoglycanen (GAG9s). De verscheidenheid in de chemische structuur van lineaire zure GAG9s komt voort uit de GAG klasse, sulfatie, epimerisatie en acetylatie. Met behulp van messenger tagging IR spectroscopie zijn recentelijk met succes gesulfoneerde mono- en disachariden gekarakteriseerd. In dit proefschrift is het anticoagulant pentasaccharide fondaparinux, dat acht sulfaatgroepen bevat, onderzocht met behulp van cryogene IR spectroscopie in helium nanodruppels om het werkingsprincipe van de meting aan te tonen. De spectroscopische vingerafdruk toont unieke absorptiebanden in het midden-IR bereik voor de sulfaatgroepen. Het meten van een systematische set van alle natuurlijk voorkomende sulfatievariaties in chondroƔtine- en dermatan-sulfaat (CS/DS) heeft de differentiatie mogelijkheden met behulp van cryogene IR spectroscopie verder aangetoond. Uit de IR-vingerafdruk in combinatie met computationele modellering kan bovendien conformationele diversiteit als gevolg van sulfatie en ladingsdichtheidsverdeling worden afgeleid. In een andere studie onthullen de IR-vingerafdrukken van vier heparansulfaat (HS) diastereomeren een modulariteit in hun chemische structuur die verklaard is met behulp van computationele modellering door hun unieke waterstofbrugpatronen. De kennis van het geprefereerde waterstofbindingspatroon zou bijvoorbeeld kunnen helpen bij de ontwikkeling van labelingstrategieƎn in IM-MS. De resultaten laten zien dat de hoge resolutie in de optische vingerafdrukken van GAG9s het mogelijk maakt om eenduidig de diversiteit op te lossen dievoortkomt uit GAG klasse, sulfatie en epimerisatie. De resultaten illustreren het belang van gas-fase cryogene IR spectroscopie om toekomstige analytische workflows voor GAG sequencing te verbeteren. Een volledig op MS gebaseerde workflow zou de ionisatie van een intacte GAG-keten kunnen omvatten en tandem MS met IM-MS en cryogene IR-spectroscopie van de respectieve fragmenten kunnen combineren om een GAG-keten eenduidig te karakteriseren in ƈƈn enkel MS-experiment.In het laatste deel van het proefschrift is cryogene IR-spectroscopie gecombineerd met random forest modellering om vibratie patronen die kenmerkend zijn voor structurele eigenschappen in GAG9s aan te tonen. De geselecteerde structurele eigenschappen omvatten de GAG-klasse en sulfatie en karakteriseren derhalve bijna volledig de onderliggende chemische structuur. In een proof-of-concept studie is een voorspellingsscore van >97% bereikt voor HS tetra- en hexasachariden op basis van een trainingsset van slechts 21 spectra. Vooral voor bepaalde markermotieven, zoals 3-O-sulfatie in kankercellen, zou deze workflow nuttig kunnen blijken. Met algoritmen voor machine learning zou de noodzaak voor het gebruik van uitgebreide spectrale databanken voor de identificatie van onbekende GAG9s kunnen worden omzeild. Concluderend kan gesteld worden dat de resultaten zoals beschreven in dit proefschrift aantonen dat IR-spectroscopie op basis van MS zeker het potentieel heeft om het stadium van het academisch basisonderzoek te verlaten en een waardevolle aanvulling vormt op MS gebaseerde analytische technieken

    MS/MS Analysis and Automated Tool Development for Protein Post-Translational Modifications

    Get PDF
    Protein post-translational modifications (PTMs) are important for a variety of reasons. PTMs confer the final protein product and biological functionality onto a nascent protein chain. Two common PTMs are glycosylation and disulfide bond formation. Both glycosylation and disulfide bond formation contribute to a variety of biological processes, including protein folding and stabilization. Mass spectrometry (MS) has shown to be an essential technique to study PTMs, especially when tandem mass spectrometry (MS/MS) experiments are performed. In the characterization of PTMs using MS/MS, different fragmentation techniques are often used. Regardless of the dissociation method that is employed, MS/MS data interpretation is a tedious and lengthy process. To render this analysis more efficient, the use of automated tools is necessary. In this work, collision induced dissociation (CID) MS/MS experiments were carried out in order to create a set of fragmentation rules applicable to any N-linked glycopeptide. These rules were then used to develop an algorithm to power publicly available software that accurately determines glycopeptide composition from MS/MS data. This program greatly reduces the time it takes researchers to manually assign the identity of an N-linked glycopeptide to an acquired CID spectrum. In addition, electron transfer dissociation (ETD) experiments were performed in order to devise a computational approach that works to determine precursor charge state directly from MS/MS data of peptides containing disulfide bonds. Lastly, alternate fragmentation patterns found to be detected in glycopeptides containing labile monosaccharide residues such as sialic acid are discussed. These patterns, along with other trends noticed after extensive analysis of N-linked glycopeptide CID data, were then used to propose future updates to the GPG analysis tool

    Spectrum and Retention Time Prediction for N-Glycopeptides Using Deep Learning

    Get PDF
    Sequencing proteins and glycans have important clinical applications, as glycosylation is shown to play a significant role in cellular communication and immune response. Certain glycans are linked to the diagnosis of cancer as well as targeted immunotherapy. Mass spectrometry is a powerful tool that helps us gain insight into peptide sequences and glycan structures, by using database search, spectral library, or de novo sequencing. Spectrum and retention time prediction using deep learning has gained popularity with studies on non-glycosylated peptides and has been shown to improve database search results via rescoring. This thesis proposes deep learning models to predict spectrum and retention time for N-glycopeptides and then discusses the applications of these models with respect to glycopeptide sequencing. Chapter 3 presents a graph deep learning model to predict fragment ion intensities of observed spectrums and define a spectrum representation for glycan fragments with up to three cleavages. The spectrum prediction model has a median cosine similarity of 0.921, which is 20% higher than previous attempts at glycopeptide spectrum prediction. For retention time prediction in Chapter 4, we propose a model with two parallel encoders for both peptide and glycan input and apply transfer learning for the sequence encoder. The retention time prediction model has a Pearson correlation of 1.0, which is higher than the previous 0.98 and 0.96 attempts. We also introduce the 95 percentile delta as an evaluation metric, as well as discuss the interpretability of our model. Finally in Chapter 5, we apply our spectrum and retention time prediction models in glycopeptide sequencing pipelines, including database search and de novo search. We show that our model improves identification by rescoring and has the potential to be used as a filter for false positives. We also demonstrate that our model improves de novo identification when used in the scoring function

    Algorithms for integrated analysis of glycomics and glycoproteomics by LC-MS/MS

    Get PDF
    The glycoproteome is an intricate and diverse component of a cell, and it plays a key role in the definition of the interface between that cell and the rest of its world. Methods for studying the glycoproteome have been developed for released glycan glycomics and site-localized bottom-up glycoproteomics using liquid chromatography-coupled mass spectrometry and tandem mass spectrometry (LC-MS/MS), which is itself a complex problem. Algorithms for interpreting these data are necessary to be able to extract biologically meaningful information in a high throughput, automated context. Several existing solutions have been proposed but may be found lacking for larger glycopeptides, for complex samples, different experimental conditions, different instrument vendors, or even because they simply ignore fundamentals of glycobiology. I present a series of open algorithms that approach the problem from an instrument vendor neutral, cross-platform fashion to address these challenges, and integrate key concepts from the underlying biochemical context into the interpretation process. In this work, I created a suite of deisotoping and charge state deconvolution algorithms for processing raw mass spectra at an LC scale from a variety of instrument types. These tools performed better than previously published algorithms by enforcing the underlying chemical model more strictly, while maintaining a higher degree of signal fidelity. From this summarized, vendor-normalized data, I composed a set of algorithms for interpreting glycan profiling experiments that can be used to quantify glycan expression. From this I constructed a graphical method to model the active biosynthetic pathways of the sample glycome and dig deeper into those signals than would be possible from the raw data alone. Lastly, I created a glycopeptide database search engine from these components which is capable of identifying the widest array of glycosylation types available, and demonstrate a learning algorithm which can be used to tune the model to better understand the process of glycopeptide fragmentation under specific experimental conditions to outperform a simpler model by between 10% and 15%. This approach can be further augmented with sample-wide or site-specific glycome models to increase depth-of-coverage for glycoforms consistent with prior beliefs

    Computational Methods for Protein Identification from Mass Spectrometry Data

    Get PDF
    Protein identification using mass spectrometry is an indispensable computational tool in the life sciences. A dramatic increase in the use of proteomic strategies to understand the biology of living systems generates an ongoing need for more effective, efficient, and accurate computational methods for protein identification. A wide range of computational methods, each with various implementations, are available to complement different proteomic approaches. A solid knowledge of the range of algorithms available and, more critically, the accuracy and effectiveness of these techniques is essential to ensure as many of the proteins as possible, within any particular experiment, are correctly identified. Here, we undertake a systematic review of the currently available methods and algorithms for interpreting, managing, and analyzing biological data associated with protein identification. We summarize the advances in computational solutions as they have responded to corresponding advances in mass spectrometry hardware. The evolution of scoring algorithms and metrics for automated protein identification are also discussed with a focus on the relative performance of different techniques. We also consider the relative advantages and limitations of different techniques in particular biological contexts. Finally, we present our perspective on future developments in the area of computational protein identification by considering the most recent literature on new and promising approaches to the problem as well as identifying areas yet to be explored and the potential application of methods from other areas of computational biology

    Software for Automated Interpretation of Mass Spectrometry Data from Glycans and Glycopeptides

    Get PDF
    The purpose of this review is to provide those interested in glycosylation analysis with the most updated information on the availability of automated tools for MS characterization of N-linked and O-linked glycosylation types. Specifically, this review describes software tools that facilitate elucidation of glycosylation from MS data on the basis of mass alone, as well as software designed to speed the interpretation of glycan and glycopeptide fragmentation from MS/MS data. This review focuses equally on software designed to interpret the composition of released glycans and on tools to characterize N-linked and O-linked glycopeptides. Several websites have been compiled and described that will be helpful to the reader who is interested in further exploring the described tools
    • ā€¦
    corecore