7 research outputs found
Calibration of mass spectrometric peptide mass fingerprint data without specific external or internal calibrants
BACKGROUND: Peptide Mass Fingerprinting (PMF) is a widely used mass spectrometry (MS) method of analysis of proteins and peptides. It relies on the comparison between experimentally determined and theoretical mass spectra. The PMF process requires calibration, usually performed with external or internal calibrants of known molecular masses. RESULTS: We have introduced two novel MS calibration methods. The first method utilises the local similarity of peptide maps generated after separation of complex protein samples by two-dimensional gel electrophoresis. It computes a multiple peak-list alignment of the data set using a modified Minimum Spanning Tree (MST) algorithm. The second method exploits the idea that hundreds of MS samples are measured in parallel on one sample support. It improves the calibration coefficients by applying a two-dimensional Thin Plate Splines (TPS) smoothing algorithm. We studied the novel calibration methods utilising data generated by three different MALDI-TOF-MS instruments. We demonstrate that a PMF data set can be calibrated without resorting to external or relying on widely occurring internal calibrants. The methods developed here were implemented in R and are part of the BioConductor package mscalib available from . CONCLUSION: The MST calibration algorithm is well suited to calibrate MS spectra of protein samples resulting from two-dimensional gel electrophoretic separation. The TPS based calibration algorithm might be used to correct systematic mass measurement errors observed for large MS sample supports. As compared to other methods, our combined MS spectra calibration strategy increases the peptide/protein identification rate by an additional 5 – 15%
A method for improving SELDI-TOF mass spectrometry data quality
<p>Abstract</p> <p>Background</p> <p>Surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS) is a powerful tool for rapidly generating high-throughput protein profiles from a large number of samples. However, the events that occur between the first and last sample run are likely to introduce technical variation in the results.</p> <p>Methods</p> <p>We fractionated and analyzed quality control and investigational serum samples on 3 Protein Chips and used statistical methods to identify poor-quality spectra and to identify and reduce technical variation.</p> <p>Results</p> <p>Using diagnostic plots, we were able to visually depict all spectra and to identify and remove those that were of poor quality. We detected a technical variation associated with when the samples were run (referred to as batch effect) and corrected for this variation using analysis of variance. These corrections increased the number of peaks that were reproducibly detected.</p> <p>Conclusion</p> <p>By removing poor-quality, outlier spectra, we were able to increase peak detection, and by reducing the variance introduced when samples are processed and analyzed in batches, we were able to increase the reproducibility of peak detection.</p
Analytical model of peptide mass cluster centres with applications
BACKGROUND: The elemental composition of peptides results in formation of distinct, equidistantly spaced clusters across the mass range. The property of peptide mass clustering is used to calibrate peptide mass lists, to identify and remove non-peptide peaks and for data reduction. RESULTS: We developed an analytical model of the peptide mass cluster centres. Inputs to the model included, the amino acid frequencies in the sequence database, the average length of the proteins in the database, the cleavage specificity of the proteolytic enzyme used and the cleavage probability. We examined the accuracy of our model by comparing it with the model based on an in silico sequence database digest. To identify the crucial parameters we analysed how the cluster centre location depends on the inputs. The distance to the nearest cluster was used to calibrate mass spectrometric peptide peak-lists and to identify non-peptide peaks. CONCLUSION: The model introduced here enables us to predict the location of the peptide mass cluster centres. It explains how the location of the cluster centres depends on the input parameters. Fast and efficient calibration and filtering of non-peptide peaks is achieved by a distance measure suggested by Wool and Smilansky
Regression methods for survival and multistate models.
A common research interest in medical, biological, and engineering research is determining whether certain independent variables are correlated with the survival or failure times. Standard statistical techniques cannot usually be applied for failure-time data due to the lack of complete data or in other word, due to censoring. From a statistical perspective, the study of time to event data is even more challenging when further complexities such as high dimensionality or multivariablity is added to the model. In this dissertation, we consider the predicating patient survival from proteomic profile of patient serum using matrix-assisted laser desorption/ionization time-of-flight (MALDI-TOF) data of non-small cell lung cancer patients. Due to much larger dimension of features in a mass spectrum compared to the study sample size, traditional linear regression modeling of survival times with high number of proteomic features is not feasible. Hence, we consider latent factor and regularized/penalized methods for fitting such models in order to predict patient survival from the mass spectrometry features. Extensive numerical studies involving both simulated as well as real mass spectrometry data are used to compare four popular regression methods, namely, partial least squares (PLS), sparse partial least square (SPLS), least absolute shrinkage and selection operator (LASSO) and elastic net regularization, on processed spectra. Right censoring is handled through a residual based multiple imputation. Overall, more complex methods such as the elastic net and SPLS result in better performances provided the operational parameters are chosen carefully via cross validation. For survival time prediction, we recommend using the elastic net based on a selected set of features. As a type of multivariate survival data, multistate models have a wide range of applications. Most of the existing regression approaches to analyze such data are based on parametric and semi-parametric procedures in which one should rely on specific model structures. In this dissertation, we construct non-parametric regression estimators of a number of temporal functions in a multistate system based on a univariate continuous baseline covariate. These estimators include state occupation probabilities, state entry, exit and waiting (sojourn) times distribution functions of a general progressive (e.g. acyclic) multistate model. The data are subject to right censoring and the censoring mechanism is explainable by observable covariates that could be time dependent. The resulting estimators are valid even if the multistate process is non-Markov. The performance of the estimators is studied using a detailed simulation. We illustrate our estimators using a data set on bone marrow transplant patients. Finally, some extension of the proposed methods to more general case with multivariate covariates are presented along with plans for future developments
Exploration of the fundamentals of matrix assisted laser desorption/ionization time-of- flight mass spectrometry
This thesis focuses on the study of different tools used for preparing samples for MALDI TOFMS and the utilization of these tools to study different ionization processes operating in the MALDI experiment.The electrospray deposition technique was employed to study the effect of salt/analyte (S/A) ratio, a critical factor for the quantitation of synthetic polymer samples. The analysis was performed with four different matrices: DHB, CHCA, dithranol, and DCTB, with three different alkali ions: lithium, sodium, and potassium. The results obtained from titrating a PMMA 6800 sample with different alkalis in the presence of different matrices produced varying results, from an “ideal” titration curve using sodium with the DHB matrix to a “non-ideal” titration curve (i.e., increasing analyte signal up to a S/A value of approximately 1, followed by a decreasing analyte signal with further increases in salt) for sodium with the dithranol matrix.Utilization of a specially designed split probe in which the segregation of the PEG 1500 analyte and the lithium hyrdroxide cationization agent was complete, demonstrated the unequivocal proof of gas phase cationization. A dual electrospray deposition system was successfully developed in which two solutions are sprayed simultaneously, ensuring that the contents of the independently prepared samples are not mixed in the solution state. This device was used for the further study of both the gas phase cationization reactions of polymers and the counterion exchange reaction observed with inorganic complexes.Four matrices were used to analyze a ruthenium dimer complex in which the use of all matrices except DCTB led to dissociation of the non-covalent bonds of the dimer complexes. An analysis of hetero ligand ruthenium complexes with all matrices except DCTB demonstrated a counterion exchange reaction where the ClO4- or PF6- counterions were substituted by a matrix anion.Performing TOFMS instrument mass calibration using synthetic polymer calibrants with different molecular weight ranges uncovered visual and mathematical methods for evaluating the accuracy of the calibration function. Statistical analysis of systematic errors in the observed mass deviations enabled development of a correction method that yielded corrected mass values acceptable for use in accurate mass measurements.Ph.D., Analytical Chemistry -- Drexel University, 200
Computational methods for the analysis of mass spectrometry imaging data
A powerful enhancement to MS-based detection is the addition of spatial information to the chemical data; an approach called mass spectrometry imaging (MSI). MSI enables two- and three-dimensional overviews of hundreds of molecular species over a wide mass range in complex biological samples. In this work, we present two computational methods and a workflow that address three different aspects of MSI data analysis: correction of mass shifts, unsupervised exploration of the data and importance of preprocessing and chemometrics to extract meaningful information from the data. We introduce a new lock mass-free recalibration procedure that enables to significantly reduce these mass shift effects in MSI data. Our method exploits similarities amongst peaklist pairs and takes advantage of the spatial context in three different ways, to perform mass correction in an iterative manner. As an extension of this work, we also present a Java-based tool, MSICorrect, that implements our recalibration approach and also allows data visualization. In the next part, an unsupervised approach to rank ion intensity maps based on the abundance of their spatial pattern is presented. Our method provides a score to every ion intensity map based on the abundance of spatial pattern present in it and then ranks all the maps using it. To know which masses exhibit similar spatial distribution, our method uses spatial-similarity based grouping to provide lists of masses that exhibit similar distribution patterns. In the last part, we demonstrate the application of a data preprocessing and multivariate analysis pipeline to a real-world biological dataset. We demonstrate this by applying the full pipeline to a high-resolution MSI dataset acquired from the leaf surface of Black cottonwood (Populus trichocarpa). Application of the pipeline helped in highlighting and visualizing the chemical specificity on the leaf surface
Analyse von Peptid Massen Fingerabdruck Datensätzen
### Table Of Contents
1. Overview
2. Introduction
3. Biological Mass Spectrometry
4. A mathematical model of the peptide mass rule with applications
5. Calibration of mass spectrometric peptide mass fingerprint data without specific external or internal calibrants
6. Transformation and other factors of the biological mass spectrometry pairwise peak-list comparison process
7. Conclusions
8. ReferencesRecent advances in genomics, which outstanding achievements were exemplified
by the complete sequencing of the human genome provided the infrastructure and
information enabling the development of several proteomic technologies.
Currently no single proteomic analysis strategy can sufficiently address the
question of how the proteome is organised in terms of numerical complexity and
complexity generated by the protein-protein interactions forming
supramolecular complexes within the cell.
In order to bring a detailed structural/functional picture of these complexes
in whole genomes, cells, organelles or in normal and pathological states
several proteomic strategies can be utilised. Combination of technologies will
bring a more detailed answer to what are the components of certain cellular
pathways (e.g.: targets of kinases/phosphatases, cytoskeletal proteins,
signalling molecules), how do they interconnect, how are they modified in the
cell and what are the roles of several complex components in normal and
disease conditions.
These types of studies depend on fast and high throughput methods of protein
identification. One of the most common methods of analysis is mass
spectrometric technique called peptide mapping. Peptide mapping is the
comparison of mass spectrometrically determined peptide masses of a sequence
specific digest of a single protein or peptide of interest with peptide masses
predicted from genomic databases. In this work several contributions to the
computational analysis of mass spectrometric data are presented. During the
course of my studies I looked at the distribution of peptide masses in
sequence specific protein sequence digests and developed a simple mathematical
model dealing with peptide mass cluster centre location. I have introduced and
studied the methods of calibration of mass spectrometric peak-list without
resorting to internal or external calibration samples. Of importance is also
contribution of this work to the calibration of data produced in high
throughput experiments. In addition, I studied how filtering of non-peptide
peaks influences the identification rates in mass spectrometric instruments.
Furthermore, I focused my studies on measures of spectra similarity which can
be used to acquire supplementary information, increasing the sensitivity and
specificity of database searches.Fortschritte in der Gnomforschung, deren herrausragende Errungenschaften mit
der Sequenzierung des Menschlichen Genoms verdeutlicht wurden, stellten die
Informationen und Infrastruktur zur Verfügung welche die Entwicklung neuer
Methoden der Proteom Forschung ermöglichte.
Keine Methode der Proteom Untersuchung alleine ist in der Lage die Frage
ausreichend zu Beantworten, wie das Proteom sowohl bezüglich der numerischen
Komplexität als auch der Komplexität die sich aus den Protein Protein
Interaktionen ergibt, die supermolekulare komplexe bilden, organisiert ist. Um
ein detailliertes strukturelle und funktionelle Darstellung dieser Komplexe in
Zellen, Organellen, im normalen und in pathologische Zuständen zu gewinnen,
können mehrere Techniken der Proteom Analyse verwendet werden. Die Antwort auf
die Frage wie zellularer Signalwegen verschaltet sind, wie sie modifiziert
werden und was die Funktion von Protein Komplexen, im Normalen und Krankheit-
Zustand ist, kann mit Hilfe der Kombination mehrerer Proteom Analyse Techniken
bestimmt werden. Die Realisation dieser Studien benötigt Hoch Durchsatz
Methoden zur schnellen Proteinidentifizierung.
Eines der gebräuchlichsten Analyseverfahren ist die Identifizierung von
Proteinen und Peptiden mit Hilfe Massenspektrometrischer Messungen.
Massenspektrometrisch bestimmte Peptid-Massen eines Sequenz spezifischen
Protein-Verdaus werden mit theoretischen Massen, die anhand einer Protein-
Sequenz-Datenbank vorhergesagt wurden, verglichen.
In dieser Arbeit werden Beiträge zur computer-unterstützten Analyse von
Massenspektrometrischen Daten vorgestellt. Während meiner Studien betrachtete
ich die Verteilung der Peptidmassen, wie sie durch einen Sequenz-Spezifischen
Protein-Verdauen gebildet werden. Ich entwickelte eine mathematisches Modell
um die empirischen Eigenschaften der Verteilung z.B. die Position von Peptide
Massen Clustern vorherzusagen. In dieser Arbeit habe ich auch Methoden zur
Kalibrierung von Massenspektrometrischen Signalen untersucht, die ohne interne
und externe Kalibrierungs-Proben arbeiten. Desweiteren wurde analysiert wie
das Entfernen von Nicht-Peptidmassen die Identifizierung-Ergebnisse
beeinflusst. Außerdem fokussierte ich meine Studien auf Maße der
Spektrenähnlichkeit. Diese Maße können dazu verwendet werden um die
Empfindlichkeit und die Genauigkeit der Datenbanksuchen zu erhöhend