383 research outputs found

    MALDI-ToF mass spectrometry biomarker profiling via multivariate data analysis application in the biopharmaceutical bioprocessing industry

    Get PDF
    PhD ThesisMatrix-assisted laser desorption/ionisation time-of-flight mass spectrometry (MALDI-ToF MS) is a technique by which protein profiles can be rapidly produced from biological samples. Proteomic profiling and biomarker identification using MALDI-ToF MS have been utilised widely in microbiology for bacteria identification and in clinical proteomics for disease-related biomarker discovery. To date, the benefits of MALDI-ToF MS have not been realised in the area of mammalian cell culture during bioprocessing. This thesis explores the approach of ‘intact-cell’ MALDI-ToF MS (ICM-MS) combined with projection to latent structures – discriminant analysis (PLS-DA), to discriminate between mammalian cell lines during bioprocessing. Specifically, the industrial collaborator, Lonza Biologics is interested in adopting this approach to discriminate between IgG monoclonal antibody producing Chinese hamster ovaries (CHO) cell lines based on their productivities and identify protein biomarkers which are associated with the cell line productivities. After classifying cell lines into two categories (high/low producers; Hs/Ls), it is hypothesised that Hs and Ls CHO cells exhibit different metabolic profiles and hence differences in phenotypic expression patterns will be observed. The protein expression patterns correlate to the productivities of the cell lines, and introduce between-class variability. The chemometric method of PLS-DA can use this variability to classify the cell lines as Hs or Ls. A number of differentially expressed proteins were matched and identified as biomarkers after a SwissProt/TrEMBL protein database search. The identified proteins revealed that proteins involved in biological processes such as protein biosynthesis, protein folding, glycolysis and cytoskeleton architecture were upregulated in Hs. This study demonstrates that ICM-MS combined with PLS-DA and a protein database search can be a rapid and valuable tool for biomarker discovery in the bioprocessing industry. It may help in providing clues to potential cell genetic engineering targets as well as a tool in process development in the bioprocessing industry. With the completion of the sequencing of the CHO genome, this study provides a foundation for rapid biomarker profiling of CHO cell lines in culture during recombinant protein manufacturing.Lonza Biologics

    Genetic Programming for Biomarker Detection in Classification of Mass Spectrometry Data

    No full text
    Mass spectrometry (MS) is currently the most commonly used technology in biochemical research for proteomic analysis. The primary goal of proteomic profiling using mass spectrometry is the classification of samples from different experimental states. To classify the MS samples, the identification of protein or peptides (biomarker detection) that are expressed differently between the classes, is required. However, due to the high dimensionality of the data and the small number of samples, classification of MS data is extremely challenging. Another important aspect of biomarker detection is the verification of the detected biomarker that acts as an intermediate step before passing these biomarkers to the experimental validation stage. Biomarker detection aims at altering the input space of the learning algorithm for improving classification of proteomic or metabolomic data. This task is performed through feature manipulation. Feature manipulation consists of three aspects: feature ranking, feature selection, and feature construction. Genetic programming (GP) is an evolutionary computation algorithm that has the intrinsic capability for the three aspects of feature manipulation. The ability of GP for feature manipulation in proteomic biomarker discovery has not been fully investigated. This thesis, therefore, proposes an embedded methodology for these three aspects of feature manipulation in high dimensional MS data using GP. The thesis also presents a method for biomarker verification, using GP. The thesis investigates the use of GP for both single-objective and multi-objective feature selection and construction. In feature ranking, the thesis proposes a GP-based method for ranking subsets of features by using GP as an ensemble approach. The proposed algorithm uses GP capability to combine the advantages of different feature ranking metrics and evolve a new ranking scheme for the subset of the features selected from the top ranked features. The capability of GP as a classifier is also investigated by this method. The results show that GP can select a smaller number of features and provide a better ranking of the selected features, which can improve the classification performance of five classifiers. In feature construction, this thesis proposes a novel multiple feature construction method, which uses a single GP tree to generate a new set of high-level features from the original set of selected features. The results show that the proposed new algorithm outperforms two feature selection algorithms. In feature selection, the thesis introduces the first GP multi-objective method for biomarker detection, which simultaneously increase the classification accuracy and reduce the number of detected features. The proposed multi-objective method can obtain better subsets of features than the single-objective algorithm and two traditional multi-objective approaches for feature selection. This thesis also develops the first multi-objective multiple feature construction algorithm for MS data. The proposed method aims at both maximising the classification performance and minimizing the cardinality of the constructed new high-level features. The results show that GP can dis- cover the complex relationships between the features and can significantly improve classification performance and reduce the cardinality. For biomarker verification, the thesis proposes the first GP biomarker verification method through measuring the peptide detectability. The method solves the imbalance problem in the data and shows improvement over the benchmark algorithms. Also, the algorithm outperforms a well-known peptide detection method. The thesis also introduces a new GP method for alignment of MS data as a preprocessing stage, which will further help in improving the biomarker detection process

    Quantitative analysis of mass spectrometry proteomics data : Software for improved life science

    Get PDF
    The rapid advances in life science, including the sequencing of the human genome and numerous other techiques, has given an extraordinary ability to aquire data on biological systems and human disease. Even so, drug development costs are higher than ever, while the rate of new approved treatments is historically low. A potential explanation to this discrepancy might be the difficulty of understanding the biology underlying the acquired data; the difficulty to refine the data to useful knowledge through interpretation. In this thesis the refinement of the complex data from mass spectrometry proteomics is studied. A number of new algorithms and programs are presented and demonstrated to provide increased analytical ability over previously suggested alternatives. With the higher goal of increasing the mass spectrometry laboratory scientific output, pragmatic studies were also performed, to create new set on compression algorithms for reduced storage requirement of mass spectrometry data, and also to characterize instrument stability. The final components of this thesis are the discussion of the technical and instrumental weaknesses associated with the currently employed mass spectrometry proteomics methodology, and the discussion of current lacking academical software quality and the reasons thereof. As a whole, the primary algorithms, the enabling technology, and the weakness discussions all aim to improve the current capability to perform mass spectrometry proteomics. As this technology is crucial to understand the main functional components of biology, proteins, this quest should allow better and higher quality life science data, and ultimately increase the chances of developing new treatments or diagnostics

    Discovering circulating protein biomarkers through in-depth plasma proteomics

    Get PDF
    Plasma, i.e., the liquid component of blood, is one of the most clinically used samples for biomarker measurement. Despite that plasma proteins and metabolites are the most frequently analysed biomarkers in practice, identifying and implementing new circulating protein biomarkers for diagnosis, treatment prediction, prognosis, and disease monitoring has been limited. This PhD thesis compiles the discovery of systemic alterations in the blood plasma proteome and potential biomarkers related to disease status, prognosis, or treatment through plasma proteomics. We analysed plasma and serum samples with global proteomics by high-resolution isoelectric focusing (HiRIEF) and liquid chromatography coupled with mass-spectrometry (LC-MS/MS), and targeted proteomics by antibody-based proximity extension assays (PEA) in three diseases that would benefit from blood biomarkers: stage IV metastatic cutaneous melanoma (mCM), glioblastoma (GBM), and coronavirus disease 2019 (COVID-19). Specifically: a.) New treatment options for mCM substantially prolong overall survival (OS), but multiple patients do not respond to treatment or develop treatment resistance, thus having shorter progression free survival (PFS). Corroborated by the presence of multiple metastases, which makes biomarker sampling difficult, circulating proteins derived from the tumour and in response to treatment could serve as predictive and prognostic biomarkers in mCM. b.) GBM is the most malignant primary brain tumour with limited treatment options and notoriously short OS. Sampling biomarkers for GBM requires an invasive surgical intervention on the skull, which makes GBM a good candidate for circulating protein biomarkers for prognosis and monitoring. c.) COVID-19 is an inflammation-driven infectious disease that affects multiple organs and systems, thus making the plasma proteome a good source to explore systemic biological processes occurring in COVID-19. In papers I and II, using HiRIEF LC-MS/MS and PEA, we explored the treatment-driven plasma proteome alterations in mCM patients treated with anti-PD-1 immune checkpoint inhibitors (ICI) and MAPK-inhibitors (MAPKi), respectively, and identified potential treatment predictive and monitoring biomarkers. mCM patients treated with anti-PD-1 ICI had a strong increase in soluble PD-1 levels during treatment, and upregulation of proteins involved in T-cell response. BRAF[V600]-mutated mCM patients treated with MAPKi had deregulation in proteins involved in immune response and proteolysis. CPB1 had the highest increase in patients treated with BRAF- and MEK-inhibitors and was associated with longer PFS. Higher levels of several proteins involved in inflammation before treatment were associated with shorter PFS regardless of ICI or MAPKi treatment. In paper III, using HiRIEF LC-MS/MS and PEA, we longitudinally analysed the plasma proteome dynamics of GBM patients, collecting plasma samples before surgery and at three timepoints after surgery. Through consensus clustering, based on treatment-naïve plasma protein levels, we identified two patient clusters that differed in median OS. The association between the cluster membership and OS remained consistent after adjustment for age, sex, and treatment. Through machine learning, we identified protein panels that separated the patient clusters and may serve as prognostic biomarkers. The largest alterations in the plasma proteome of GBM patients occurred within two months after surgery, whereas the plasma protein levels at later timepoints had no difference compared to pre- surgery levels. We observed a decrease in glioma-elevated proteins in the blood after surgery, identifying potential monitoring biomarkers. In paper IV, using HiRIEF LC-MS/MS, we analysed serum proteome alterations in hospitalised COVID-19 patients in comparison to healthy controls, and identified a strong upregulation in inflammatory, interferon-induced, and proteasomal proteins. Several protein groups showed association with clinical parameters of COVID-19 severity, including proteasomal proteins. Serum proteome alterations were traceable to proteome alterations induced in a lung adenocarcinoma cell line (Calu-3) by infection with SARS-CoV-2. Finally, we performed the first meta-analysis of global proteomics studies of the soluble blood proteome in COVID-19, providing estimates of standardised mean differences and summary receiver operating characteristics curves. We demonstrate the high accuracy and precision of HiRIEF LC-MS/MS when compared to the meta-analysis estimates and pinpoint proteins that may serve as biomarkers of COVID-19. In summary, this thesis postulates that new circulating protein biomarkers would be clinically useful. By combining mass-spectrometry- and antibody-based-proteomics, we demonstrate the potential of in-depth analyses of the plasma proteome in capturing systemic alterations related to treatment, survival, and disease status, pinpointing potentially novel biomarkers that require validation in larger cohorts

    Tutorial: Correction of shifts in single-stage LC-MS(/MS) data

    Get PDF
    Abstract Label-free LC-MS(/MS) provides accurate quantitative profiling of proteins and metabolites in complex biological samples such as cell lines, tissues and body fluids. A label-free experiment consists of several LC-MS(/MS) chromatograms that might be acquired over several days, across multiple laboratories using different instruments. Single-stage part (MS1 map) of the LC-MS(/MS) contains quantitative information on all compounds that can be detected by LC-MS(/MS) and is the data of choice used by quantitative LC-MS(/MS) data pre-processing workflows. Differences in experimental conditions and fluctuation of analytical parameters influence the overall quality of the MS1 maps and are factors hampering comparative statistical analyses and data interpretation. The quality of the obtained MS1 maps can be assessed based on changes in the two separation dimensions (retention time, mass-to-charge ratio) and the readout (ion intensity) of MS1 maps. In this tutorial we discuss two types of changes, monotonic and non-monotonic shifts, which may occur in the two separation dimensions and the readout of MS1 map. Monotonic shifts of MS1 maps can be corrected, while non-monotonic ones can only be assessed but not corrected, since correction would require precise modelling of the underlying physicochemical effects, which would require additional parameters and analysis. We discuss reasons for monotonic and non-monotonic shifts in the two separation dimensions and readout of MS1 maps, as well as algorithms that can be used to correct monotonic or to assess the extent non-monotonic shifts. Relation of non-monotonic shift with peak elution order inversion and orthogonality as defined in analytical chemistry is discussed. We aim this tutorial for data generator and evaluators scientists who aim to known the condition and approaches to produce and pre-processed comparable MS1 maps

    Needles in a haystack of protein diversity: Interrogation of complex biological samples through specialized strategies in bottom-up proteomics uncover peptides of interest for diverse applications

    Get PDF
    Peptide identification is at the core of bottom-up proteomics measurements. However, even with state-of the-art mass spectrometric instrumentation, peptide level information is still lost or missing in these types of experiments. Reasons behind missing peptide identifications in bottom-up proteomics include variable peptide ionization efficiencies, ion suppression effects, as well as the occurrence of chimeric spectra that can lower the efficacy of database search strategies. Peptides derived from naturally abundant proteins in a biological system also have better chances of being identified in comparison to the ones produced from less abundant proteins, at least in regular discovery-based proteomics experiments. This dissertation focused on the recovery of the “missing or hidden proteome” information in complex biological matrices by approaching this challenge under a peptide-centric view and implementing different liquid chromatography tandem mass spectrometry (LC-MS/MS) experimental workflows. In particular, the projects presented here covered: (1) The feasibility of applying a liquid chromatography-multiple reaction monitoring MS methodology for the targeted identification of peptides serving as surrogates of protein biomarkers in environmental matrices with unknown microbial diversities; (2) the evaluation of selecting unique tryptic peptides in-silico that can distinguish groups of proteins, instead of individual proteins, for targeted proteomics workflows; (3) maximizing peptide identification in spectral data collected from different LC-MS/MS setups by applying a multi-peptide-spectrum-match algorithm, and (4) showing that LC-MS/MS combined with de novo assisted-database searches is a feasible strategy for the comprehensive identification of peptides derived from native proteolytic mechanisms in biological systems

    Biomarker Discovery and Validation for Proteomics and Genomics: Modeling And Systematic Analysis

    Get PDF
    Discovery and validation of protein biomarkers with high specificity is the main challenge of current proteomics studies. Different mass spectrometry models are used as shotgun tools for discovery of biomarkers which is usually done on a small number of samples. In the discovery phase, feature selection plays a key role. The first part of this work focuses on the feature selection problem and proposes a new Branch and Bound algorithm based on U-curve assumption. The U-curve branch-and-bound algorithm (UBB) for optimization was introduced recently by Barrera and collaborators. In this work we introduce an improved algorithm (IUBB) for finding the optimal set of features based on the U-curve assumption. The results for a set of U-curve problems, generated from a cost model, show that the IUBB algorithm makes fewer evaluations and is more robust than the original UBB algorithm. The two algorithms are also compared in finding the optimal features of a real classification problem designed using the data model. The results show that IUBB outperforms UBB in finding the optimal feature sets. On the other hand, the result indicate that the performance of the error estimator is crucial to the success of the feature selection algorithm. To study the effect of error estimation methods, in the next section of the work, we study the effect of the complexity of the decision boundary on the performance of error estimation methods. First, a model is developed which quantifies the complexity of a classification problem purely in terms of the geometry of the decision boundary, without relying on the Bayes error. Then, this model is used in a simulation study to analyze the bias and root-mean-square error (RMS) of a few widely used error estimation methods relative to the complexity of the decision boundary. The results show that all the estimation methods lose accuracy as complexity increases. Validation of a set of selected biomarkers from a list of candidates is an important stage in the biomarker identification pipeline and is the focus of the the next section of this work. This section analyzes the Selected Reaction Monitoring (SRM) pipeline in a systematic fashion, by modelling the main stages of the biomarker validation process. The proposed models for SRM and protein mixture are then used to study the effect of different parameters on the final performance of biomarker validation. We focus on the sensitivity of the SRM pipeline to the working parameters, in order to identify the bottlenecks where time and energy should be spent in designing the experiment

    Covariance mapping spectroscopy of ultrafast laser induced biomolecular dissociation

    Get PDF
    This thesis reports the development of and first results from femtosecond laser-induced ionisation/dissociation (fs-LID) two-dimensional partial covariance mass spectrometry (2D-PC- MS) of biomolecules. Collision induced dissociation (CID) 2D-PC-MS is first extended to oligonucleotides. 2D- PC-MS fragment–fragment correlations are shown to give more sequence-specific information for oligonucleotides than the individual fragment analysis of 1D MS/MS. This is particularly relevant in the important case of modified oligonucleotides, where common sequencing methods can struggle to discover the nature and location of chemical modifications. The experimental development of fs-LID 2D-PC-MS runs in parallel with predictive simulations of the laser–ion cloud overlap in our experimental system, which inform the experimental parameters required for success. The discovery that contaminant free measurements are possible in negative ion mode, combined with oligonucleotides being more amenable to negative ion mode MS, the relevant experience from oligonucleotide CID 2D-PC-MS and the fact that, to the best of our knowledge, fs-LID activated MS of oligonucleotides was unprecedented, lead to a focus on fs-LID 2D-PC-MS of oligonucleotide anions. Novel 2D-PC-MS molecular diagrams are developed to give insights into the fragmentation patterns of phosphorylated peptides and oligonucleotides. These diagrams visually represent the relative probabilities of a molecule taking the fragmentation pathways found to be strongest when correlated fragment pairs are ranked by their 2D-PC-MS significance score. Comparing 2D-PC-MS molecular diagrams for CID and fs-LID activation elucidated mechanistic differences in the fragmentation behaviour between the statistical bond-breaking of CID and the ultrafast fragmentation triggered by fs-LID. A comparison of the fs-LID fragmentation pathways around different riboses finds that the nature of the ribose greatly influences the local fragmentation behaviour. In a wider context, investigating laser–biomolecule interaction mechanisms could contribute to the understanding of light–biomolecule interactions such as sunlight interacting with living cells.Open Acces
    corecore