7 research outputs found

    Calibration of mass spectrometric peptide mass fingerprint data without specific external or internal calibrants

    Get PDF
    BACKGROUND: Peptide Mass Fingerprinting (PMF) is a widely used mass spectrometry (MS) method of analysis of proteins and peptides. It relies on the comparison between experimentally determined and theoretical mass spectra. The PMF process requires calibration, usually performed with external or internal calibrants of known molecular masses. RESULTS: We have introduced two novel MS calibration methods. The first method utilises the local similarity of peptide maps generated after separation of complex protein samples by two-dimensional gel electrophoresis. It computes a multiple peak-list alignment of the data set using a modified Minimum Spanning Tree (MST) algorithm. The second method exploits the idea that hundreds of MS samples are measured in parallel on one sample support. It improves the calibration coefficients by applying a two-dimensional Thin Plate Splines (TPS) smoothing algorithm. We studied the novel calibration methods utilising data generated by three different MALDI-TOF-MS instruments. We demonstrate that a PMF data set can be calibrated without resorting to external or relying on widely occurring internal calibrants. The methods developed here were implemented in R and are part of the BioConductor package mscalib available from . CONCLUSION: The MST calibration algorithm is well suited to calibrate MS spectra of protein samples resulting from two-dimensional gel electrophoretic separation. The TPS based calibration algorithm might be used to correct systematic mass measurement errors observed for large MS sample supports. As compared to other methods, our combined MS spectra calibration strategy increases the peptide/protein identification rate by an additional 5 – 15%

    A method for improving SELDI-TOF mass spectrometry data quality

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) is a powerful tool for rapidly generating high-throughput protein profiles from a large number of samples. However, the events that occur between the first and last sample run are likely to introduce technical variation in the results.</p> <p>Methods</p> <p>We fractionated and analyzed quality control and investigational serum samples on 3 Protein Chips and used statistical methods to identify poor-quality spectra and to identify and reduce technical variation.</p> <p>Results</p> <p>Using diagnostic plots, we were able to visually depict all spectra and to identify and remove those that were of poor quality. We detected a technical variation associated with when the samples were run (referred to as batch effect) and corrected for this variation using analysis of variance. These corrections increased the number of peaks that were reproducibly detected.</p> <p>Conclusion</p> <p>By removing poor-quality, outlier spectra, we were able to increase peak detection, and by reducing the variance introduced when samples are processed and analyzed in batches, we were able to increase the reproducibility of peak detection.</p

    Analytical model of peptide mass cluster centres with applications

    Get PDF
    BACKGROUND: The elemental composition of peptides results in formation of distinct, equidistantly spaced clusters across the mass range. The property of peptide mass clustering is used to calibrate peptide mass lists, to identify and remove non-peptide peaks and for data reduction. RESULTS: We developed an analytical model of the peptide mass cluster centres. Inputs to the model included, the amino acid frequencies in the sequence database, the average length of the proteins in the database, the cleavage specificity of the proteolytic enzyme used and the cleavage probability. We examined the accuracy of our model by comparing it with the model based on an in silico sequence database digest. To identify the crucial parameters we analysed how the cluster centre location depends on the inputs. The distance to the nearest cluster was used to calibrate mass spectrometric peptide peak-lists and to identify non-peptide peaks. CONCLUSION: The model introduced here enables us to predict the location of the peptide mass cluster centres. It explains how the location of the cluster centres depends on the input parameters. Fast and efficient calibration and filtering of non-peptide peaks is achieved by a distance measure suggested by Wool and Smilansky

    Regression methods for survival and multistate models.

    Get PDF
    A common research interest in medical, biological, and engineering research is determining whether certain independent variables are correlated with the survival or failure times. Standard statistical techniques cannot usually be applied for failure-time data due to the lack of complete data or in other word, due to censoring. From a statistical perspective, the study of time to event data is even more challenging when further complexities such as high dimensionality or multivariablity is added to the model. In this dissertation, we consider the predicating patient survival from proteomic profile of patient serum using matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) data of non-small cell lung cancer patients. Due to much larger dimension of features in a mass spectrum compared to the study sample size, traditional linear regression modeling of survival times with high number of proteomic features is not feasible. Hence, we consider latent factor and regularized/penalized methods for fitting such models in order to predict patient survival from the mass spectrometry features. Extensive numerical studies involving both simulated as well as real mass spectrometry data are used to compare four popular regression methods, namely, partial least squares (PLS), sparse partial least square (SPLS), least absolute shrinkage and selection operator (LASSO) and elastic net regularization, on processed spectra. Right censoring is handled through a residual based multiple imputation. Overall, more complex methods such as the elastic net and SPLS result in better performances provided the operational parameters are chosen carefully via cross validation. For survival time prediction, we recommend using the elastic net based on a selected set of features. As a type of multivariate survival data, multistate models have a wide range of applications. Most of the existing regression approaches to analyze such data are based on parametric and semi-parametric procedures in which one should rely on specific model structures. In this dissertation, we construct non-parametric regression estimators of a number of temporal functions in a multistate system based on a univariate continuous baseline covariate. These estimators include state occupation probabilities, state entry, exit and waiting (sojourn) times distribution functions of a general progressive (e.g. acyclic) multistate model. The data are subject to right censoring and the censoring mechanism is explainable by observable covariates that could be time dependent. The resulting estimators are valid even if the multistate process is non-Markov. The performance of the estimators is studied using a detailed simulation. We illustrate our estimators using a data set on bone marrow transplant patients. Finally, some extension of the proposed methods to more general case with multivariate covariates are presented along with plans for future developments

    Exploration of the fundamentals of matrix assisted laser desorption/ionization time-of- flight mass spectrometry

    Get PDF
    This thesis focuses on the study of different tools used for preparing samples for MALDI TOFMS and the utilization of these tools to study different ionization processes operating in the MALDI experiment.The electrospray deposition technique was employed to study the effect of salt/analyte (S/A) ratio, a critical factor for the quantitation of synthetic polymer samples. The analysis was performed with four different matrices: DHB, CHCA, dithranol, and DCTB, with three different alkali ions: lithium, sodium, and potassium. The results obtained from titrating a PMMA 6800 sample with different alkalis in the presence of different matrices produced varying results, from an “ideal” titration curve using sodium with the DHB matrix to a “non-ideal” titration curve (i.e., increasing analyte signal up to a S/A value of approximately 1, followed by a decreasing analyte signal with further increases in salt) for sodium with the dithranol matrix.Utilization of a specially designed split probe in which the segregation of the PEG 1500 analyte and the lithium hyrdroxide cationization agent was complete, demonstrated the unequivocal proof of gas phase cationization. A dual electrospray deposition system was successfully developed in which two solutions are sprayed simultaneously, ensuring that the contents of the independently prepared samples are not mixed in the solution state. This device was used for the further study of both the gas phase cationization reactions of polymers and the counterion exchange reaction observed with inorganic complexes.Four matrices were used to analyze a ruthenium dimer complex in which the use of all matrices except DCTB led to dissociation of the non-covalent bonds of the dimer complexes. An analysis of hetero ligand ruthenium complexes with all matrices except DCTB demonstrated a counterion exchange reaction where the ClO4- or PF6- counterions were substituted by a matrix anion.Performing TOFMS instrument mass calibration using synthetic polymer calibrants with different molecular weight ranges uncovered visual and mathematical methods for evaluating the accuracy of the calibration function. Statistical analysis of systematic errors in the observed mass deviations enabled development of a correction method that yielded corrected mass values acceptable for use in accurate mass measurements.Ph.D., Analytical Chemistry -- Drexel University, 200

    Computational methods for the analysis of mass spectrometry imaging data

    Get PDF
    A powerful enhancement to MS-based detection is the addition of spatial information to the chemical data; an approach called mass spectrometry imaging (MSI). MSI enables two- and three-dimensional overviews of hundreds of molecular species over a wide mass range in complex biological samples. In this work, we present two computational methods and a workflow that address three different aspects of MSI data analysis: correction of mass shifts, unsupervised exploration of the data and importance of preprocessing and chemometrics to extract meaningful information from the data. We introduce a new lock mass-free recalibration procedure that enables to significantly reduce these mass shift effects in MSI data. Our method exploits similarities amongst peaklist pairs and takes advantage of the spatial context in three different ways, to perform mass correction in an iterative manner. As an extension of this work, we also present a Java-based tool, MSICorrect, that implements our recalibration approach and also allows data visualization. In the next part, an unsupervised approach to rank ion intensity maps based on the abundance of their spatial pattern is presented. Our method provides a score to every ion intensity map based on the abundance of spatial pattern present in it and then ranks all the maps using it. To know which masses exhibit similar spatial distribution, our method uses spatial-similarity based grouping to provide lists of masses that exhibit similar distribution patterns. In the last part, we demonstrate the application of a data preprocessing and multivariate analysis pipeline to a real-world biological dataset. We demonstrate this by applying the full pipeline to a high-resolution MSI dataset acquired from the leaf surface of Black cottonwood (Populus trichocarpa). Application of the pipeline helped in highlighting and visualizing the chemical specificity on the leaf surface

    Analyse von Peptid Massen Fingerabdruck Datensätzen

    No full text
    ### Table Of Contents 1. Overview 2. Introduction 3. Biological Mass Spectrometry 4. A mathematical model of the peptide mass rule with applications 5. Calibration of mass spectrometric peptide mass fingerprint data without specific external or internal calibrants 6. Transformation and other factors of the biological mass spectrometry pairwise peak-list comparison process 7. Conclusions 8. ReferencesRecent advances in genomics, which outstanding achievements were exemplified by the complete sequencing of the human genome provided the infrastructure and information enabling the development of several proteomic technologies. Currently no single proteomic analysis strategy can sufficiently address the question of how the proteome is organised in terms of numerical complexity and complexity generated by the protein-protein interactions forming supramolecular complexes within the cell. In order to bring a detailed structural/functional picture of these complexes in whole genomes, cells, organelles or in normal and pathological states several proteomic strategies can be utilised. Combination of technologies will bring a more detailed answer to what are the components of certain cellular pathways (e.g.: targets of kinases/phosphatases, cytoskeletal proteins, signalling molecules), how do they interconnect, how are they modified in the cell and what are the roles of several complex components in normal and disease conditions. These types of studies depend on fast and high throughput methods of protein identification. One of the most common methods of analysis is mass spectrometric technique called peptide mapping. Peptide mapping is the comparison of mass spectrometrically determined peptide masses of a sequence specific digest of a single protein or peptide of interest with peptide masses predicted from genomic databases. In this work several contributions to the computational analysis of mass spectrometric data are presented. During the course of my studies I looked at the distribution of peptide masses in sequence specific protein sequence digests and developed a simple mathematical model dealing with peptide mass cluster centre location. I have introduced and studied the methods of calibration of mass spectrometric peak-list without resorting to internal or external calibration samples. Of importance is also contribution of this work to the calibration of data produced in high throughput experiments. In addition, I studied how filtering of non-peptide peaks influences the identification rates in mass spectrometric instruments. Furthermore, I focused my studies on measures of spectra similarity which can be used to acquire supplementary information, increasing the sensitivity and specificity of database searches.Fortschritte in der Gnomforschung, deren herrausragende Errungenschaften mit der Sequenzierung des Menschlichen Genoms verdeutlicht wurden, stellten die Informationen und Infrastruktur zur Verfügung welche die Entwicklung neuer Methoden der Proteom Forschung ermöglichte. Keine Methode der Proteom Untersuchung alleine ist in der Lage die Frage ausreichend zu Beantworten, wie das Proteom sowohl bezüglich der numerischen Komplexität als auch der Komplexität die sich aus den Protein Protein Interaktionen ergibt, die supermolekulare komplexe bilden, organisiert ist. Um ein detailliertes strukturelle und funktionelle Darstellung dieser Komplexe in Zellen, Organellen, im normalen und in pathologische Zuständen zu gewinnen, können mehrere Techniken der Proteom Analyse verwendet werden. Die Antwort auf die Frage wie zellularer Signalwegen verschaltet sind, wie sie modifiziert werden und was die Funktion von Protein Komplexen, im Normalen und Krankheit- Zustand ist, kann mit Hilfe der Kombination mehrerer Proteom Analyse Techniken bestimmt werden. Die Realisation dieser Studien benötigt Hoch Durchsatz Methoden zur schnellen Proteinidentifizierung. Eines der gebräuchlichsten Analyseverfahren ist die Identifizierung von Proteinen und Peptiden mit Hilfe Massenspektrometrischer Messungen. Massenspektrometrisch bestimmte Peptid-Massen eines Sequenz spezifischen Protein-Verdaus werden mit theoretischen Massen, die anhand einer Protein- Sequenz-Datenbank vorhergesagt wurden, verglichen. In dieser Arbeit werden Beiträge zur computer-unterstützten Analyse von Massenspektrometrischen Daten vorgestellt. Während meiner Studien betrachtete ich die Verteilung der Peptidmassen, wie sie durch einen Sequenz-Spezifischen Protein-Verdauen gebildet werden. Ich entwickelte eine mathematisches Modell um die empirischen Eigenschaften der Verteilung z.B. die Position von Peptide Massen Clustern vorherzusagen. In dieser Arbeit habe ich auch Methoden zur Kalibrierung von Massenspektrometrischen Signalen untersucht, die ohne interne und externe Kalibrierungs-Proben arbeiten. Desweiteren wurde analysiert wie das Entfernen von Nicht-Peptidmassen die Identifizierung-Ergebnisse beeinflusst. Außerdem fokussierte ich meine Studien auf Maße der Spektrenähnlichkeit. Diese Maße können dazu verwendet werden um die Empfindlichkeit und die Genauigkeit der Datenbanksuchen zu erhöhend
    corecore