71 research outputs found

    Studying the effect of parallelization on the performance of Andromeda Search Engine: A search engine for peptides

    Get PDF
    Human body is made of proteins. The analysis of structure and functions of these proteins reveal important information about human body. An important technique used for protein evaluation is Mass Spectrometry. The protein data generated using mass spectrometer is analyzed for the detection of patterns in proteins. A wide variety of operations are performed on the data obtained from a mass spectrometer namely visualization, spectral deconvolution, peak alignment, normalization, pattern recognition and significance testing. There are a number of software that analyze the huge volume of data generated from a mass spectrometer. An example of such a software is MaxQuant that analyzes high resolution mass spectrometric data. A search engine called Andromeda is integrated into MaxQuant that is used for peptide identification. ^ One major drawback of the Andromeda Search Engine is its execution time. Identification of peptides involves a number of complex operations and intensive data processing. Therefore this research work focuses on implementing parallelization as a way to improve the performance of the Andromeda Search Engine. This is done by partitioning the data and distributing it across various cores and nodes. Also multiple tasks are executed concurrently on multiple nodes and cores. ^ A number of bioinformatics applications have been parallelized with significant improvement in execution time over the serial version. For this research work Task Parallel Library (TPL) and Common Library Runtime (CLR) constructs are used for parallelizing the application. The aim of this research work is to implement these techniques to parallelize the Andromeda Search Engine and gain improvement in the execution time by leveraging multi core architecture

    An analysis of techniques for obtaining speedup in software solutions for proteomics

    Get PDF
    Proteomics is the study of the structure and behavior of proteins. In order to perform this kind of an analysis, proteins are broken down into their constituent peptides, and values of different parameters are recorded. The amount of this kind of data collected is very large, and therefore analysis of this data becomes time-consuming. To overcome this issue, there have been several software developed for processing protein data. With a large number of software solutions available in the market, the decision about which one to use is hard for the end-user. The expectations from such a software for each user vary. Apart from the end-users, there are software developers working on improving the quality of the existing software by implementing suitable optimization techniques. This study therefore proposes a comparison framework for software solutions for proteomics that will benefit the readers by providing all the information required in order to make a decision about the software that is suitable for their needs

    Optimization of High Field Asymmetric Waveform Ion Mobility Spectrometry to enhance the comprehensiveness of mass spectrometry-based proteomic analyses

    Full text link
    La grande complexitĂ© des Ă©chantillons biologiques peut compliquer l'identification des protĂ©ines et compromettre la profondeur et la couverture des analyses protĂ©omiques utilisant la spectromĂ©trie de masse. Des techniques de sĂ©paration permettant d’amĂ©liorer l’efficacitĂ© et la sĂ©lectivitĂ© des analyses LC-MS/MS peuvent ĂȘtre employĂ©es pour surmonter ces limitations. La spectromĂ©trie de mobilitĂ© ionique diffĂ©rentielle, utilisant un champ Ă©lectrique Ă©levĂ© en forme d'onde asymĂ©trique (FAIMS), a montrĂ© des avantages significatifs dans l’amĂ©lioration de la transmission d'ions peptidiques Ă  charges multiples, et ce, en rĂ©duisant les ions interfĂ©rents. Dans ce contexte, l'objectif de cette thĂšse Ă©tait d'explorer les capacitĂ©s analytiques de FAIMS afin d'Ă©largir Ă  la fois la gamme dynamique de dĂ©tection des protĂ©ines/peptides et la prĂ©cision des mesures en protĂ©omique quantitative par spectromĂ©trie de masse. Pour cela, nous avons systĂ©matiquement intĂ©grĂ© FAIMS dans des approches classiques en protĂ©omique afin de dĂ©terminer les changements dynamiques du protĂ©ome humain en rĂ©ponse Ă  l’hyperthermie. Nous avons d’abord Ă©tudiĂ© les avantages de FAIMS par rapport Ă  la quantification par marquage isobare (tandem mass tag, TMT). Cette approche permet le marquage d'ions peptidiques avec diffĂ©rents groupements chimiques dont les masses nominales sont identiques mais diffĂ©rant par leur distribution respective d'isotopes stables. Les ions peptidiques marquĂ©s par TMT produisent des ions rapporteurs de masses distinctes une fois fragmentĂ©s en MS/MS. Malheureusement, la co-sĂ©lection d'ions prĂ©curseurs conduit souvent Ă  des spectres MS/MS chimĂ©riques et une approche plus lente basĂ©e sur le MS3 est nĂ©cessaire pour une quantification prĂ©cise. Comme FAIMS amĂ©liore l’efficacitĂ© de sĂ©paration en transmettant sĂ©lectivement des ions en fonction de leur voltage de compensation (CV), nous avons obtenu moins de co-sĂ©lection de peptides. FAIMS a amĂ©liorĂ© la quantification des peptides TMT au niveau MS2 et a permis d’obtenir 68% plus de peptides quantifiĂ©s par rapport aux analyses LC-MS/MS classiques, fournissant ainsi un aperçu plus vaste des changements dynamiques du protĂ©ome humain en rĂ©ponse au stress thermique. De plus, nous avons Ă©tudiĂ© le marquage mĂ©tabolique par incorporation d’acides aminĂ©s marquĂ©s par des isotopes stables en culture cellulaire (SILAC). Si des interfĂ©rences co-Ă©luent avec les isotopes SILAC, la quantification devient imprĂ©cise et les contreparties de SILAC peuvent ĂȘtre assignĂ©es de maniĂšre erronĂ©e aux ions interfĂ©rants du chromatogramme, faussant ainsi le rapport SILAC. Le fractionnement post-ionisation FAIMS pourrait filtrer les ions appartenant au bruit de fond qui pourraient autrement ĂȘtre attribuĂ©s Ă  une paire ou Ă  un triplet SILAC pour la quantification. Dans ce projet, FAIMS a Ă©tĂ© particuliĂšrement bĂ©nĂ©fique pour les espĂšces peu abondantes et s’est montrĂ© plus performant que le fractionnement par Ă©change de cations (SCX). En outre, FAIMS a permis la sĂ©paration des phosphoisomĂšres frĂ©quemment observĂ©s dans les extraits complexes de phosphoprotĂ©omes. Le troisiĂšme objectif de ce travail de recherche Ă©tait d'explorer la sĂ©paration de l'Ă©tat de charge et la transmission amĂ©liorĂ©e de peptides fortement chargĂ©s avec FAIMS et son application Ă  l'analyse de peptides SUMOylĂ©s. FAIMS pourrait ainsi amĂ©liorer la transmission des peptides SUMOylĂ©s triplement chargĂ©s par rapport aux peptides tryptiques usuels, lesquels sont principalement doublement chargĂ©s. Ceci permettait l'enrichissement en phase gazeuse des ions peptides SUMOylĂ©s. FAIMS est une approche alternative plus simple pour fractionner les peptides SUMOylĂ©s, ce qui rĂ©duit les pertes d’échantillon et permet de simplifier le traitement des Ă©chantillons, tout en augmentant l’efficacitĂ© de sĂ©paration de maniĂšre plus automatisĂ©e et en ajoutant un ordre de grandeur de sensibilitĂ©. Le dernier objectif de cette thĂšse Ă©tait d’amĂ©liorer l’instrumentation de FAIMS en le jumelant aux instruments Ă  la fine pointe de la technologie. Avec un nouveau dispositif FAIMS, dĂ©veloppĂ© par nos collaborateurs chez Thermo Fisher Scientific, nous avons montrĂ© une amĂ©lioration dans la robustesse et la transmission des ions pour la nouvelle interface. Dans des expĂ©riences simples en protĂ©omique shotgun, FAIMS a Ă©tendu la gamme dynamique d'un ordre de grandeur pour une couverture protĂ©omique plus profonde par rapport aux analyses LC-MS/MS classiques. En outre, le fractionnement en phase gazeuse de FAIMS a gĂ©nĂ©rĂ© moins d’analyses chimĂ©riques en MS2, ce qui a permis d’obtenir plus d’identifications et une meilleure quantification. Pour ce faire, nous avons directement comparĂ© le LC-FAIMS-MS/MS au LC-MS/MS/MS en utilisant la sĂ©lection de prĂ©curseur synchrone (SPS) avec et sans fractionnement en phase inverse basique. Des mesures quantitatives comparables ont Ă©tĂ© obtenues pour toutes les mĂ©thodes, Ă  l'exception du fait que FAIMS a parmi d’obtenir un nombre 2,5 fois plus grand de peptides quantifiables par rapport aux expĂ©riences sans FAIMS. Globalement, cette thĂšse met en Ă©vidence certains des avantages que FAIMS peut offrir aux expĂ©riences en protĂ©omique en amĂ©liorant Ă  la fois l'identification et la quantification des peptides.The high complexity of biological samples can confound protein identification and compromise the depth and coverage of mass spectrometry-based proteomic analyses. Separation techniques that provide improved peak capacity and selectivity of LC-MS/MS analyses are often sought to overcome these limitations. High-field asymmetric waveform ion mobility spectrometry (FAIMS), a differential ion mobility device, has shown significant advantages by enhancing the transmission of multiple-charged peptide ions by reducing singly-charged interferences. In this context, the goal of this thesis was to explore the analytical capabilities of FAIMS to extend both the dynamic range of proteins/peptides detection and the precision of quantitative proteomic measurements by mass spectrometry. For this, we systematically integrated FAIMS in standard workflows to monitor the dynamic changes of the human proteome in response to hyperthermia. We first studied the merits of FAIMS to aid isobaric labeling quantification with tandem mass tags (TMT). This approach allows the labeling of peptide ions with different chemical groups of identical nominal masses but differing in their respective distribution of stable isotopes. TMT-labeled peptide ions produce reporter ions of distinct masses once fragmented by MS/MS. Unfortunately, the co-selection of precursor ions often leads to chimeric MS/MS spectra, and a slower MS3 centric approach is needed for precise quantification. Since FAIMS improves peak capacity by selectively transmitting ions based on their compensation voltage (CV), we obtained less peptide co-selection. FAIMS improved TMT quantification at the MS2 level and achieved 68 % more quantified peptides compared to regular LC-MS/MS, providing a deeper insight into the dynamic changes of the human proteome in response to heat stress. Further, we investigated stable isotope labeling by amino acids in cell culture (SILAC) quantification. If interferences co-elute simultaneously with SILAC isotopomers, quantification becomes inaccurate and SILAC counterparts can be missassigned to interfering ions in the highly populated chromatogram, thus skewing the SILAC ratio. FAIMS post-ionization fractionation could filter out background ions that can otherwise be attributed to a SILAC pair/triplet for quantification. In this work, FAIMS was especially beneficial for low abundant species and outperformed the standard strong cation exchange (SCX) fractionation workflow. In addition, FAIMS allowed the separation of phosphoisomers that are frequently observed in complex phosphoproteome extracts. The third aim of this work explored the charge state separation and enhanced transmission of highly charged peptides with FAIMS and its application for SUMOylated peptide analysis. FAIMS could enhance the transmission of triply charged SUMOylated peptides over typical tryptic peptide that are predominantly doubly charged, by applying more negative CVs with FAIMS. This allowed for gas-phase enrichment of SUMOylated peptide ions. FAIMS is an alternate and more straightforward approach to fractionate SUMOylated peptides that reduced sample loss, avoided sample processing, while increasing peak capacity in a more automated manner and added one order of magnitude in sensitivity. The last aim of this thesis was to improve the FAIMS instrumentation by interfacing it to the latest state-of-the-art instruments. With a new FAIMS device developed by our collaborators at Thermo Fisher Scientific, we demonstrate the robustness and the improved ion transmission for the new interface. In simple shotgun proteomics, FAIMS extended the dynamic range by one order of magnitude for deeper proteome coverage compared to regular LC-MS/MS. Moreover, fewer MS2 chimeric scans were generated with FAIMS gas-phase fractionation, which garnered more identifications and better quantification. For this, we directly compared LC-FAIMS-MS/MS to LC-MS/MS/MS using synchronous precursor selection (SPS) with and without basic reverse phase fractionation. Comparable quantitative measurements were obtained for all methods, except that FAIMS provided a 2.5-fold increase in the number of quantifiable peptides compared with non-FAIMS experiments. Overall, this thesis highlights some of the advantages that FAIMS can provide for proteomics experiments by improving both peptide identification and quantification

    Quantitative analysis of mass spectrometry proteomics data : Software for improved life science

    Get PDF
    The rapid advances in life science, including the sequencing of the human genome and numerous other techiques, has given an extraordinary ability to aquire data on biological systems and human disease. Even so, drug development costs are higher than ever, while the rate of new approved treatments is historically low. A potential explanation to this discrepancy might be the difficulty of understanding the biology underlying the acquired data; the difficulty to refine the data to useful knowledge through interpretation. In this thesis the refinement of the complex data from mass spectrometry proteomics is studied. A number of new algorithms and programs are presented and demonstrated to provide increased analytical ability over previously suggested alternatives. With the higher goal of increasing the mass spectrometry laboratory scientific output, pragmatic studies were also performed, to create new set on compression algorithms for reduced storage requirement of mass spectrometry data, and also to characterize instrument stability. The final components of this thesis are the discussion of the technical and instrumental weaknesses associated with the currently employed mass spectrometry proteomics methodology, and the discussion of current lacking academical software quality and the reasons thereof. As a whole, the primary algorithms, the enabling technology, and the weakness discussions all aim to improve the current capability to perform mass spectrometry proteomics. As this technology is crucial to understand the main functional components of biology, proteins, this quest should allow better and higher quality life science data, and ultimately increase the chances of developing new treatments or diagnostics

    Mass spectrometry-based high-throughput proteomics and its role in biomedical studies and systems biology

    Get PDF
    There are multiple reasons why the next generation of biological and medical studies require increasing numbers of samples. Biological systems are dynamic, and the effect of a perturbation depends on the genetic background and environment. As a consequence, many conditions need to be considered to reach generalizable conclusions. Moreover, human population and clinical studies only reach sufficient statistical power if conducted at scale and with precise measurement methods. Finally, many proteins remain without sufficient functional annotations, because they have not been systematically studied under a broad range of conditions. In this review, we discuss the latest technical developments in mass spectrometry (MS)-based proteomics that facilitate large-scale studies by fast and efficient chromatography, fast scanning mass spectrometers, data-independent acquisition (DIA), and new software. We further highlight recent studies which demonstrate how high-throughput (HT) proteomics can be applied to capture biological diversity, to annotate gene functions or to generate predictive and prognostic models for human diseases. Keywords: biomarker discovery; data-independent acquisition; dynamic biological systems; gene annotation; precision medicine; proteomics

    Development and application of Dissociable Antibody Microarray (DAMA) staining technique

    Get PDF
    Dissociable Antibody Microarray (DAMA) staining is a novel technique that integrates protein microarrays with conventional immunostaining techniques. It can simultaneously determine the expression and subcellular localizations (SCLs) of hundreds of proteins in cultured cells. I optimized this technology for protein expression and SCL profiling, and generated expression profile data analysis program DAMAPEP, molecular image database management program ChipView and automatic SCL assignment program DAMASCL. We demonstrated the application of this technique in the identification of potential biomarkers for breast cancer. We compared the expression profiles of 312 proteins among ten breast cell lines and identified 10 differentially expressed proteins. Among those proteins, RAIDD, Rb p107, Rb p130, SRF and Tyk2 were confirmed by western blot and statistical analysis to have higher expression levels in cancer breast cells than in normal breast cells. We also compared the SCL profiles of 325 proteins among nine breast cell lines, and identified one protein, Cyclin B1, with different SCLs between two normal and seven cancer breast cell lines. With individual immunostaining, Cyclin B1 was confirmed to localize in the cytoplasm of seven cancer cells and in both cytoplasm and nuclei of two normal cells and to have higher expression levels in the seven cancer cell lines. We expanded the scale of DAMA staining to include 400 antibodies per array and surveyed SCL profiles of 400 antibodies in five prostate cell lines. Five proteins were identified to have altered SCL patterns between normal and cancer prostate cell lines. GRK2 was so far confirmed to localize ubiquitously in the cytosol of three normal and one cancer prostate cell lines while concentrating at certain regions right beneath plasma membrane in the other two cancer prostate cell lines. We also extended the application of DAMA staining to interrogate protein expression profiles in tissue samples and found 3 proteins with differential expression in two tissue samples from different breast cancer patients, demonstrating the potential use of DAMA staining in carcinoma characterization and classification. Database of annotated protein molecular images obtained from DAMA staining need to be created and shared for better understanding of cancer biology

    Mass spectrometry data mining for cancer detection

    Get PDF
    Early detection of cancer is crucial for successful intervention strategies. Mass spectrometry-based high throughput proteomics is recognized as a major breakthrough in cancer detection. Many machine learning methods have been used to construct classifiers based on mass spectrometry data for discriminating between cancer stages, yet, the classifiers so constructed generally lack biological interpretability. To better assist clinical uses, a key step is to discover ”biomarker signature profiles”, i.e. combinations of a small number of protein biomarkers strongly discriminating between cancer states. This dissertation introduces two innovative algorithms to automatically search for a signature and to construct a high-performance signature-based classifier for cancer discrimination tasks based on mass spectrometry data, such as data acquired by MALDI or SELDI techniques. Our first algorithm assumes that homogeneous groups of mass spectra can be modeled by (unknown) Gibbs distributions to generate an optimal signature and an associated signature-based classifier by robust log-likelihood analysis; our second algorithm uses a stochastic optimization algorithm to search for two lists of biomarkers, and then constructs a signature-based classifier. To support these two algorithms theoretically, this dissertation also studies the empirical probability distributions of mass spectrometry data and implements the actual fitting of Markov random fields to these high-dimensional distributions. We have validated our two signature discovery algorithms on several mass spectrometry datasets related to ovarian cancer and to colorectal cancer patients groups. For these cancer discrimination tasks, our algorithms have yielded better classification performances than existing machine learning algorithms and in addition,have generated more interpretable explicit signatures.Mathematics, Department o

    MSQBAT - A Software Suite for LC-MS Protein Quantification

    Get PDF
    Accessing the relative changes in protein abundance is essential for a proper understanding of the various processes underlying disease progression and development. Nowadays, mass spectrometry-based proteomics allows for the identification of several thousand proteins in a single analysis. Unfortunately, mass spectrometry is inherently not quantitative, which is why additional techniques for protein quantification have to be developed. To measure quantitative changes in protein abundance, biological samples need either to be labeled using stable isotopes or protein abundances have to be computed using so called label-free techniques. Label-based quantification approaches are costly and the number of samples that can be quantified against each other is limited. Furthermore, depending on the sample, the introduction of the labels can be elaborate. Label-free quantification is not confronted with these limitations; principally, an unlimited number of samples can be quantified without the introduction of isotopes. Yet these advantages have their price: The development of label-free quantification algorithms is not trivial and requires profound knowledge both in bioinformatics and mass spectrometry. Namely the design of systems flexible enough to quantify data deriving from different mass spectrometric systems and proteomic workflows require additional experience and time. In order to quantify data acquired by LC-MALDI-MS, a novel software suite termed MSQBAT was developed and evaluated. MSQBAT is a platform independent software suite for MS1-based, label-free protein quantification. In contrast to other software solutions, MSQBAT is highly flexible and suited for the quantification of mass spectrometric data from various instrumental setups and proteomic workflows, such as (Ge)LC-MALDI-MS and (Ge)LC-ESI-MS. Quantification capabilities were evaluated using spike-in experiments analyzed using both different proteomic workflows and instruments. Human proteins were spiked in variable concentrations into a complex E.coli back-ground proteome and processed using both an LC-MS and a GeLC-MS approach. Samples were chromatographically separated on a nanoACQUITY UPLC system using a 120 minutes gradient and subsequently analyzed by an AB SCIEX TOF/TOF 5800 system and an AB SCIEX QTRAP 6500 system. Furthermore, a publicly available quantification benchmark data set has been used to evaluate LC-ESI-MS quantification capabilities. Obtained results show that MSQBAT can be applied to quantify data deriving from both LC-/GeLC-MALDI-MS and LC-/GeLC-ESI-MS workflows with high accuracy. Therefore, this software suite has a range of application outperforming all currently available solutions

    Methods in automated glycosaminoglycan tandem mass spectra analysis

    Get PDF
    Glycosylation is the process by which a glycan is enzymatically attached to a protein, and is one of the most common post-translational modifications in nature. One class of glycans is the glycosaminoglycans (GAGs), which are long, linear polysaccharides that are variably sulfated and make up the glycan portion of proteoglycans (PGs). PGs are located on the cellular surface and in the extracellular matrix (ECM), making them important molecules for cell signaling and ligand binding. The GAG sulfation sequence is a determining factor for the signaling capacity of binding complexes, so accurate determination of the sequence is critical. Historically, GAG sequencing using tandem mass spectrometry (MS2) has been a difficult, manual process; however, with the advent of faster computational techniques and higher-resolution MS2, high-throughput GAG sequencing is within reach. Two steps in the pipeline of biomolecule sequencing using MS2 are discovery and interpretation of spectral peaks. The discovery step traditionally is performed using methods that rely on the concept of averagine, or the average molecular building block for the analyte in question. These methods were developed for protein sequencing, but perform considerably worse on GAG sequences, due to the non-uniform distribution of sulfur atoms along the chain and the relatively high isotope abundance of 34S. The interpretation step traditionally is performed manually, which takes time and introduces potential user error. To combat these problems, I developed GAGfinder, the first GAG-specific MS2 peak finding and annotation software. GAGfinder is described in detail in chapter two. Another step in MS2 sequencing is the determination of the sequence using the found MS2 fragments. For a given GAG composition, there are many possible sequences, and peak finding algorithms such as GAGfinder return a list of the peaks in the MS2 mass spectrum. The many-to-many relationship between sequences and fragments can be represented using a bipartite network, and node-ranking techniques can be employed to generate likelihood scores for possible sequences. I developed a bipartite network-based sequencing tool, GAGrank, based on a bipartite network extension of Google’s PageRank algorithm for ranking websites. GAGrank is described in detail in chapter three

    Comprehensive Overview of Bottom-up Proteomics using Mass Spectrometry

    Full text link
    Proteomics is the large scale study of protein structure and function from biological systems through protein identification and quantification. "Shotgun proteomics" or "bottom-up proteomics" is the prevailing strategy, in which proteins are hydrolyzed into peptides that are analyzed by mass spectrometry. Proteomics studies can be applied to diverse studies ranging from simple protein identification to studies of proteoforms, protein-protein interactions, protein structural alterations, absolute and relative protein quantification, post-translational modifications, and protein stability. To enable this range of different experiments, there are diverse strategies for proteome analysis. The nuances of how proteomic workflows differ may be challenging to understand for new practitioners. Here, we provide a comprehensive overview of different proteomics methods to aid the novice and experienced researcher. We cover from biochemistry basics and protein extraction to biological interpretation and orthogonal validation. We expect this work to serve as a basic resource for new practitioners in the field of shotgun or bottom-up proteomics
    • 

    corecore