5 research outputs found

    Model-based Biomarker Detection and Systematic Analysis in Translational Science

    Get PDF
    This dissertation is concerned with the application of mathematical modeling and statistical signal processing into the rapidly expanding fields of proteomics and genomics. The research is guided by a translational goal which drives the problem formalization and experimental design, and leads to optimization, prediction and control of the underlying system. The dissertation is comprised of three interconnected subjects. In the first part of the dissertation, two Bayesian peptide detection algorithms are proposed to optimize the feature extraction step, which is the most fundamental step in mass spectrometry-based proteomics. The algorithms are designed to tackle data processing challenges that are not satisfactorily addressed by existing methods. In contrast to most existing methods, the proposed algorithms perform deisotoping and deconvolution of mass spectra simultaneously, which enables better identification of weak peptide signals. Unlike greedy template-matching algorithms, the proposed methods have the capability to handle complex spectra where features overlap. The proposed methods achieve better sensitivity and accuracy compared to many popular software packages such as msInspect. In the second part of the dissertation, we consider modeling and assessing the entire mass spectrometry-based proteomic data analysis pipeline. Different modules are identified and analyzed, resulting in a framework that captures key factors in system performance. The effects of various model parameters on protein identification rates and quantification errors, differential expression results, and classification performance are examined. The proposed pipeline model can be used to aid experimental design, pinpoint critical bottlenecks, optimize the work flow, and predict biomarker discovery results. Finally, the same system methodology is extended to analyze the work flow in DNA microarray experiments. A model-based approach is developed to explore the relationship among microarray data properties, missing value imputation, and sample classification in a complicated data analysis pipeline. The situations when it is suitable to apply missing value imputation are identified and recommendations regarding imputation are provided. In addition, a missing value rate-related peaking phenomenon is uncovered

    Features-Based Deisotoping Method for Tandem Mass Spectra

    Get PDF
    For high-resolution tandem mass spectra, the determination of monoisotopic masses of fragment ions plays a key role in the subsequent peptide and protein identification. In this paper, we present a new algorithm for deisotoping the bottom-up spectra. Isotopic-cluster graphs are constructed to describe the relationship between all possible isotopic clusters. Based on the relationship in isotopic-cluster graphs, each possible isotopic cluster is assessed with a score function, which is built by combining nonintensity and intensity features of fragment ions. The non-intensity features are used to prevent fragment ions with low intensity from being removed. Dynamic programming is adopted to find the highest score path with the most reliable isotopic clusters. The experimental results have shown that the average Mascot scores and F-scores of identified peptides from spectra processed by our deisotoping method are greater than those by YADA and MS-Deconv software

    A systematic model of the LC-MS proteomics pipeline

    Get PDF
    MOTIVATION: Mass spectrometry is a complex technique used for large-scale protein profiling with clinical and pharmaceutical applications. While individual components in the system have been studied extensively, little work has been done to integrate various modules and evaluate them from a systems point of view. RESULTS: In this work, we investigate this problem by putting together the different modules in a typical proteomics work flow, in order to capture and analyze key factors that impact the number of identified peptides and quantified proteins, protein quantification error, differential expression results, and classification performance. The proposed proteomics pipeline model can be used to optimize the work flow as well as to pinpoint critical bottlenecks worth investing time and resources into for improving performance. Using the model-based approach proposed here, one can study systematically the critical problem of proteomic biomarker discovery, by means of simulation using ground-truthed synthetic MS data
    corecore