358 research outputs found

    Peptide mass fingerprinting using field-programmable gate arrays

    Get PDF
    The reconfigurable computing paradigm, which exploits the flexibility and versatility of field-programmable gate arrays (FPGAs), has emerged as a powerful solution for speeding up time-critical algorithms. This paper describes a reconfigurable computing solution for processing raw mass spectrometric data generated by MALDI-TOF instruments. The hardware-implemented algorithms for denoising, baseline correction, peak identification, and deisotoping, running on a Xilinx Virtex-2 FPGA at 180 MHz, generate a mass fingerprint that is over 100 times faster than an equivalent algorithm written in C, running on a Dual 3-GHz Xeon server. The results obtained using the FPGA implementation are virtually identical to those generated by a commercial software package MassLynx

    Computational methods and tools for protein phosphorylation analysis

    Get PDF
    Signaling pathways represent a central regulatory mechanism of biological systems where a key event in their correct functioning is the reversible phosphorylation of proteins. Protein phosphorylation affects at least one-third of all proteins and is the most widely studied posttranslational modification. Phosphorylation analysis is still perceived, in general, as difficult or cumbersome and not readily attempted by many, despite the high value of such information. Specifically, determining the exact location of a phosphorylation site is currently considered a major hurdle, thus reliable approaches are necessary for the detection and localization of protein phosphorylation. The goal of this PhD thesis was to develop computation methods and tools for mass spectrometry-based protein phosphorylation analysis, particularly validation of phosphorylation sites. In the first two studies, we developed methods for improved identification of phosphorylation sites in MALDI-MS. In the first study it was achieved through the automatic combination of spectra from multiple matrices, while in the second study, an optimized protocol for sample loading and washing conditions was suggested. In the third study, we proposed and evaluated the hypothesis that in ESI-MS, tandem CID and HCD spectra of phosphopeptides can be accurately predicted and used in spectral library searching. This novel strategy for phosphosite validation and identification offered accuracy that outperformed the other currently existing popular methods and proved applicable to complex biological samples. And finally, we significantly improved the performance of our command-line prototype tool, added graphical user interface, and options for customizable simulation parameters and filtering of selected spectra, peptides or proteins. The new software, SimPhospho, is open-source and can be easily integrated in a phosphoproteomics data analysis workflow. Together, these bioinformatics methods and tools enable confident phosphosite assignment and improve reliable phosphoproteome identification and reportin

    An FPGA Implementation to Detect Selective Cationic Antibacterial Peptides

    Get PDF
    Exhaustive prediction of physicochemical properties of peptide sequences is used in different areas of biological research. One example is the identification of selective cationic antibacterial peptides (SCAPs), which may be used in the treatment of different diseases. Due to the discrete nature of peptide sequences, the physicochemical properties calculation is considered a high-performance computing problem. A competitive solution for this class of problems is to embed algorithms into dedicated hardware. In the present work we present the adaptation, design and implementation of an algorithm for SCAPs prediction into a Field Programmable Gate Array (FPGA) platform. Four physicochemical properties codes useful in the identification of peptide sequences with potential selective antibacterial activity were implemented into an FPGA board. The speed-up gained in a single-copy implementation was up to 108 times compared with a single Intel processor cycle for cycle. The inherent scalability of our design allows for replication of this code into multiple FPGA cards and consequently improvements in speed are possible. Our results show the first embedded SCAPs prediction solution described and constitutes the grounds to efficiently perform the exhaustive analysis of the sequence-physicochemical properties relationship of peptides

    New Statistical Algorithms for the Analysis of Mass Spectrometry Time-Of-Flight Mass Data with Applications in Clinical Diagnostics

    Get PDF
    Mass spectrometry (MS) based techniques have emerged as a standard forlarge-scale protein analysis. The ongoing progress in terms of more sensitive machines and improved data analysis algorithms led to a constant expansion of its fields of applications. Recently, MS was introduced into clinical proteomics with the prospect of early disease detection using proteomic pattern matching. Analyzing biological samples (e.g. blood) by mass spectrometry generates mass spectra that represent the components (molecules) contained in a sample as masses and their respective relative concentrations. In this work, we are interested in those components that are constant within a group of individuals but differ much between individuals of two distinct groups. These distinguishing components that dependent on a particular medical condition are generally called biomarkers. Since not all biomarkers found by the algorithms are of equal (discriminating) quality we are only interested in a small biomarker subset that - as a combination - can be used as a fingerprint for a disease. Once a fingerprint for a particular disease (or medical condition) is identified, it can be used in clinical diagnostics to classify unknown spectra. In this thesis we have developed new algorithms for automatic extraction of disease specific fingerprints from mass spectrometry data. Special emphasis has been put on designing highly sensitive methods with respect to signal detection. Thanks to our statistically based approach our methods are able to detect signals even below the noise level inherent in data acquired by common MS machines, such as hormones. To provide access to these new classes of algorithms to collaborating groups we have created a web-based analysis platform that provides all necessary interfaces for data transfer, data analysis and result inspection. To prove the platform's practical relevance it has been utilized in several clinical studies two of which are presented in this thesis. In these studies it could be shown that our platform is superior to commercial systems with respect to fingerprint identification. As an outcome of these studies several fingerprints for different cancer types (bladder, kidney, testicle, pancreas, colon and thyroid) have been detected and validated. The clinical partners in fact emphasize that these results would be impossible with a less sensitive analysis tool (such as the currently available systems). In addition to the issue of reliably finding and handling signals in noise we faced the problem to handle very large amounts of data, since an average dataset of an individual is about 2.5 Gigabytes in size and we have data of hundreds to thousands of persons. To cope with these large datasets, we developed a new framework for a heterogeneous (quasi) ad-hoc Grid - an infrastructure that allows to integrate thousands of computing resources (e.g. Desktop Computers, Computing Clusters or specialized hardware, such as IBM's Cell Processor in a Playstation 3)

    Computational Strategies for Proteogenomics Analyses

    Full text link
    Proteogenomics is an area of proteomics concerning the detection of novel peptides and peptide variants nominated by genomics and transcriptomics experiments. While the term primarily refers to studies utilizing a customized protein database derived from select sequencing experiments, proteogenomics methods can also be applied in the quest for identifying previously unobserved, or missing, proteins in a reference protein database. The identification of novel peptides is difficult and results can be dominated by false positives if conventional computational and statistical approaches for shotgun proteomics are directly applied without consideration of the challenges involved in proteogenomics analyses. In this dissertation, I systematically distill the sources of false positives in peptide identification and present potential remedies, including computational strategies that are necessary to make these approaches feasible for large datasets. In the first part, I analyze high scoring decoys, which are false identifications with high assigned confidences, using multiple peptide identification strategies to understand how they are generated and develop strategies for reducing false positives. I also demonstrate that modified peptides can cause violations in the target-decoy assumptions, which is a cornerstone for error rate estimation in shotgun proteomics, leading to potential underestimation in the number of false positives. Second, I address computational bottlenecks in proteogenomics workflows through the development of two database search engines: EGADS and MSFragger. EGADS aims to address issues relating to the large sequence space involved in proteogenomics studies by using graphical processing units to accelerate both in-silico digestion and similarity scoring. MSFragger implements a novel fragment ion index and searching algorithm that vastly speeds up spectra similarity calculations. For the identification of modified peptides using the open search strategy, MSFragger is over 150X faster than conventional database search tools. Finally, I will discuss refinements to the open search strategy for detecting modified peptides and tools for improved collation and annotation. Using the speed afforded by MSFragger, I perform open searching on several large-scale proteomics experiments, identifying modified peptides on an unprecedented scale and demonstrating its utility in diverse proteomics applications. The ability to rapidly and comprehensively identify modified peptides allows for the reduction of false positives in proteogenomics. It also has implications in discovery proteomics by allowing for the detection of both common and rare (including novel) biological modifications that are often not considered in large scale proteomics experiments. The ability to account for all chemically modified peptides may also improve protein abundance estimates in quantitative proteomics.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138581/1/andykong_1.pd

    True single-cell proteomics using advanced ion mobility mass spectrometry

    Get PDF
    In this thesis, I present the development of a novel mass spectrometry (MS) platform and scan modes in conjunction with a versatile and robust liquid chromatography (LC) platform, which addresses current sensitivity and robustness limitations in MS-based proteomics. I demonstrate how this technology benefits the high-speed and ultra-high sensitivity proteomics studies on a large scale. This culminated in the first of its kind label-free MS-based single-cell proteomics platform and its application to spatial tissue proteomics. I also investigate the vastly underexplored ‘dark matter’ of the proteome, validating novel microproteins that contribute to human cellular function. First, we developed a novel trapped ion mobility spectrometry (TIMS) platform for proteomics applications, which multiplies sequencing speed and sensitivity by ‘parallel accumulation – serial fragmentation’ (PASEF) and applied it to first high-sensitivity and large-scale projects in the biomedical arena. Next, to explore the collisional cross section (CCS) dimension in TIMS, we measured over 1 million peptide CCS values, which enabled us to train a deep learning model for CCS prediction solely based on the linear amino acid sequence. We also translated the principles of TIMS and PASEF to the field of lipidomics, highlighting parallel benefits in terms of throughput and sensitivity. The core of my PhD is the development of a robust ultra-high sensitivity LC-MS platform for the high-throughput analysis of single-cell proteomes. Improvements in ion transfer efficiency, robust, very low flow LC and a PASEF data independent acquisition scan mode together increased measurement sensitivity by up to 100-fold. We quantified single-cell proteomes to a depth of up to 1,400 proteins per cell. A fundamental result from the comparisons to single-cell RNA sequencing data revealed that single cells have a stable core proteome, whereas the transcriptome is dominated by Poisson noise, emphasizing the need for both complementary technologies. Building on our achievements with the single-cell proteomics technology, we elucidated the image-guided spatial and cell-type resolved proteome in whole organs and tissues from minute sample amounts. We combined clearing of rodent and human organs, unbiased 3D-imaging, target tissue identification, isolation and MS-based unbiased proteomics to describe early-stage β-amyloid plaque proteome profiles in a disease model of familial Alzheimer’s. Automated artificial intelligence driven isolation and pooling of single cells of the same phenotype allowed us to analyze the cell-type resolved proteome of cancer tissues, revealing a remarkable spatial difference in the proteome. Last, we systematically elucidated pervasive translation of noncanonical human open reading frames combining state-of-the art ribosome profiling, CRISPR screens, imaging and MS-based proteomics. We performed unbiased analysis of small novel proteins and prove their physical existence by LC-MS as HLA peptides, essential interaction partners of protein complexes and cellular function

    Bioinformatics solutions for confident identification and targeted quantification of proteins using tandem mass spectrometry

    Get PDF
    Proteins are the structural supports, signal messengers and molecular workhorses that underpin living processes in every cell. Understanding when and where proteins are expressed, and their structure and functions, is the realm of proteomics. Mass spectrometry (MS) is a powerful method for identifying and quantifying proteins, however, very large datasets are produced, so researchers rely on computational approaches to transform raw data into protein information. This project develops new bioinformatics solutions to support the next generation of proteomic MS research. Part I introduces the state of the art in proteomic bioinformatics in industry and academia. The business history and funding mechanisms are examined to fill a notable gap in management research literature, and to explain events at the sponsor, GlaxoSmithKline. It reveals that public funding of proteomic science has yet to come to fruition and exclusively high-tech niche bioinformatics businesses can succeed in the current climate. Next, a comprehensive review of repositories for proteomic MS is performed, to locate and compile a summary of sources of datasets for research activities in this project, and as a novel summary for the community. Part II addresses the issue of false positive protein identifications produced by automated analysis with a proteomics pipeline. The work shows that by selecting a suitable decoy database design, a statistically significant improvement in identification accuracy can be made. Part III describes development of computational resources for selecting multiple reaction monitoring (MRM) assays for quantifying proteins using MS. A tool for transition design, MRMaid (pronounced „mermaid‟), and database of pre-published transitions, MRMaid-DB, are developed, saving practitioners time and leveraging existing resources for superior transition selection. By improving the quality of identifications, and providing support for quantitative approaches, this project brings the field a small step closer to achieving the goal of systems biology.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Mass Spectrometric and Molecular Analyses of Biological Agents In Environmental Compartments

    Get PDF
    abstract: This thesis discusses the use of mass spectrometry and polymerase chain reaction (PCR), among other methods, to detect biomarkers of microorganisms in the environment. These methods can be used to detect bacteria involved in the degradation of environmental pollutants (bioremediation) or various single-celled pathogens, including those posing potential threats as bioterrorism agents. The first chapter introduces the hurdles in detecting in diverse environmental compartments in which they could be found, a select list of single-celled pathogens representing known or potential bioterrorism agents. These hurdles take the form of substances that interfere either directly or indirectly with the detection method. In the case of mass spectrometry-based detection, many of these substances (interferences) can be removed via effective sample pretreatment. Chapters 2 through 4 highlight specific methods developed to detect bioremediation or bioterrorism agents in environmental matrices. These methods are qualitative mass spectrometry, quantitative PCR, and quantitative mass spectrometry, respectively. The targeted organisms in these methods include several bioremediation agents, e.g. Pseudomonas putida F1 and Sphingomonas wittichii RW1, and bioterrorism agents, e.g. norovirus and Cryptosporidium parvum. In Chapter 2, I identify using qualitative mass spectrometry, biomarkers for three bacterial species involved in bioremediation. In Chapter 3, I report on a new quantitative PCR method suitable for monitoring of a key gene in yet another bioremediation agent, Sphingomonas wittichii RW1; furthermore, I apply this method to track the efficacy of bioremediation in bioaugmented environmental microcosms. In Chapter 4, I report on the development of new quantitative mass spectrometry methods for two organisms, S. wittichii RW1 and Cryptosporidium parvum, and evaluate two previously published methods for their applicability to the analysis of complex environmental samples. In Chapter 5, I review state-of-the-art methods for the detection of emerging biological contaminants, specifically viruses, in environmental samples. While this summary deals exclusively with viral pathogens, the advantages and remaining challenges identified are also applicable to all single-celled organisms in environmental settings. The suggestions I make at the end of this chapter are expected to be valid not only for future needs for emerging viruses but also for bacteria, eukaryotic pathogens, and prions. In general, it is advisable to continue the trend towards quantification and to standardize methods to facilitate comparison of results between studies.Dissertation/ThesisPh.D. Biological Design 201
    • …
    corecore