5,501 research outputs found

    Statistical quality assessment and outlier detection for liquid chromatography-mass spectrometry experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Quality assessment methods, that are common place in engineering and industrial production, are not widely spread in large-scale proteomics experiments. But modern technologies such as Multi-Dimensional Liquid Chromatography coupled to Mass Spectrometry (LC-MS) produce large quantities of proteomic data. These data are prone to measurement errors and reproducibility problems such that an automatic quality assessment and control become increasingly important.</p> <p>Results</p> <p>We propose a methodology to assess the quality and reproducibility of data generated in quantitative LC-MS experiments. We introduce quality descriptors that capture different aspects of the quality and reproducibility of LC-MS data sets. Our method is based on the Mahalanobis distance and a robust Principal Component Analysis.</p> <p>Conclusion</p> <p>We evaluate our approach on several data sets of different complexities and show that we are able to precisely detect LC-MS runs of poor signal quality in large-scale studies.</p

    Improved quality control processing of peptide-centric LC-MS proteomics data

    Get PDF
    Motivation: In the analysis of differential peptide peak intensities (i.e. abundance measures), LC-MS analyses with poor quality peptide abundance data can bias downstream statistical analyses and hence the biological interpretation for an otherwise high-quality dataset. Although considerable effort has been placed on assuring the quality of the peptide identification with respect to spectral processing, to date quality assessment of the subsequent peptide abundance data matrix has been limited to a subjective visual inspection of run-by-run correlation or individual peptide components. Identifying statistical outliers is a critical step in the processing of proteomics data as many of the downstream statistical analyses [e.g. analysis of variance (ANOVA)] rely upon accurate estimates of sample variance, and their results are influenced by extreme values

    amsrpm: Robust Point Matching for Retention Time Aligment of LC/MS Data with R

    Get PDF
    Proteomics is the study of the abundance, function and dynamics of all proteins present in a living organism, and mass spectrometry (MS) has become its most important tool due to its unmatched sensitivity, resolution and potential for high-throughput experimentation. A frequently used variant of mass spectrometry is coupled with liquid chromatography (LC) and is denoted as "LC/MS". It produces two-dimensional raw data, where significant distortions along one of the dimensions can occur between different runs on the same instrument, and between instruments. A compensation of these distortions is required to allow for comparisons between and inference based on different experiments. This article introduces the amsrpm software package. It implements a variant of the Robust Point Matching (RPM) algorithm that is tailored for the alignment of LC and LC/MS experiments. Problem-specific enhancements include a specialized dissimilarity measure, and means to enforce smoothness and monotonicity of the estimated transformation function. The algorithm does not rely on pre-specified landmarks, it is insensitive towards outliers and capable of modeling nonlinear distortions. Its usefulness is demonstrated using both simulated and experimental data. The software is available as an open source package for the statistical programming language R.

    Machine learning for omics data analysis.

    Get PDF
    In proteomics and metabolomics, to quantify the changes of abundance levels of biomolecules in a biological system, multiple sample analysis steps are involved. The steps include mass spectrum deconvolution and peak list alignment. Each analysis step introduces a certain degree of technical variation in the abundance levels (i.e. peak areas) of those molecules. Some analysis steps introduce technical variations that affect the peak areas of all molecules equally while others affect the peak areas of a subset of molecules with varying degrees. To correct these technical variations, some existing normalization methods simply scale the peak areas of all molecules detected in one sample using a single normalization factor or fit a regression model based on different assumptions. As a result, the local technical variations are ignored and may even be amplified in some cases. To overcome the above limitations, we developed a molecule specific normalization algorithm, called MSN, which adopts a robust surface fitting strategy to minimize the molecular profile difference of a group of house-keeping molecules across samples. The house-keeping molecules are those molecules whose abundance levels were not affected by the biological treatment. We also developed an outlier detection algorithm based on Fisher Criterion to detect and remove noisy data points from the experimental data. The applications of the MSN method on two different datasets showed that MSN is a highly efficient normalization algorithm that yields the highest sensitivity and accuracy compared to five existing normalization algorithms. The outlier detection algorithm\u27s application on the same datasets has also shown to be efficient and robust

    Cross-Platform Comparison of Untargeted and Targeted Lipidomics Approaches on Aging Mouse Plasma.

    Get PDF
    Lipidomics - the global assessment of lipids - can be performed using a variety of mass spectrometry (MS)-based approaches. However, choosing the optimal approach in terms of lipid coverage, robustness and throughput can be a challenging task. Here, we compare a novel targeted quantitative lipidomics platform known as the Lipidyzer to a conventional untargeted liquid chromatography (LC)-MS approach. We find that both platforms are efficient in profiling more than 300 lipids across 11 lipid classes in mouse plasma with precision and accuracy below 20% for most lipids. While the untargeted and targeted platforms detect similar numbers of lipids, the former identifies a broader range of lipid classes and can unambiguously identify all three fatty acids in triacylglycerols (TAG). Quantitative measurements from both approaches exhibit a median correlation coefficient (r) of 0.99 using a dilution series of deuterated internal standards and 0.71 using endogenous plasma lipids in the context of aging. Application of both platforms to plasma from aging mouse reveals similar changes in total lipid levels across all major lipid classes and in specific lipid species. Interestingly, TAG is the lipid class that exhibits the most changes with age, suggesting that TAG metabolism is particularly sensitive to the aging process in mice. Collectively, our data show that the Lipidyzer platform provides comprehensive profiling of the most prevalent lipids in plasma in a simple and automated manner

    Characterisation of xenometabolome signatures in complex biomatrices for enhanced human population phenotyping

    Get PDF
    Metabolic phenotyping facilitates the analysis of low molecular weight compounds in complex biological samples, with resulting metabolite profiles providing a window on endogenous processes and xenobiotic exposures. Accurate characterisation of the xenobiotic component of the metabolome (the xenometabolome) is particularly valuable when metabolic phenotyping is used for epidemiological and clinical population studies where exposure of participants to xenobiotics is unknown or difficult to control/estimate. Additionally, as metabolic phenotyping has increasingly been incorporated into toxicology and drug metabolism research, phenotyping datasets may be exploited to study xenobiotic metabolism at the population level. This thesis describes novel analytical and data-driven strategies for broadening xenometabolome coverage to allow effective partitioning of endogenous and xenobiotic metabolome signatures. The data driven strategy was multi-faceted, involving the generation of a reference database and the application of statistical methodologies. The database contains over 100 common xenobiotics profiles - generated using established liquid chromatography-mass-spectrometry methods – and provided the basis for an empirically derived screen for human urine and blood samples. The prevalence of these xenobiotics was explored in an exemplar phenotyping dataset (ALZ; n = 650; urine), with 31 xenobiotics detected in an initial screen. Statistical based methods were tailored to extract xenobiotic-related signatures and evaluated using drugs with well-characterised human metabolism. To complement the data-driven strategies for xenometabolome coverage, a more analytical based strategy was additionally developed. A dispersive solid phase extraction sample preparation protocol for blood products was optimised, permitting efficient removal of lipids and proteins, with minimal effect on low molecular weight metabolites. The suitability and reproducibility of this method was evaluated in two independent blood sample sets (AZstudy12; n=171, MARS; n=285). Finally, these analytical and statistical strategies were applied to two existing large-scale phenotyping study datasets: AIRWAVE (n = 3000 urine, n=3000 plasma samples) and ALZ (n= 650 urine, n= 449 serum) and used to explore both xenobiotic and endogenous responses to triclosan and polyethylene glycol exposure. Exposure to triclosan highlighted affected pathways relating to sulfation, whilst exposure to PEG highlighted a possible perturbation in the glutathione cycle. The analytical and statistical strategies described in this thesis allow for a more comprehensive xenometabolome characterisation and have been used to uncover previously unreported relationships between xenobiotic and endogenous metabolism.Open Acces

    The metaRbolomics Toolbox in Bioconductor and beyond

    Get PDF
    Metabolomics aims to measure and characterise the complex composition of metabolites in a biological system. Metabolomics studies involve sophisticated analytical techniques such as mass spectrometry and nuclear magnetic resonance spectroscopy, and generate large amounts of high-dimensional and complex experimental data. Open source processing and analysis tools are of major interest in light of innovative, open and reproducible science. The scientific community has developed a wide range of open source software, providing freely available advanced processing and analysis approaches. The programming and statistics environment R has emerged as one of the most popular environments to process and analyse Metabolomics datasets. A major benefit of such an environment is the possibility of connecting different tools into more complex workflows. Combining reusable data processing R scripts with the experimental data thus allows for open, reproducible research. This review provides an extensive overview of existing packages in R for different steps in a typical computational metabolomics workflow, including data processing, biostatistics, metabolite annotation and identification, and biochemical network and pathway analysis. Multifunctional workflows, possible user interfaces and integration into workflow management systems are also reviewed. In total, this review summarises more than two hundred metabolomics specific packages primarily available on CRAN, Bioconductor and GitHub

    Current challenges in software solutions for mass spectrometry-based quantitative proteomics

    Get PDF
    This work was in part supported by the PRIME-XS project, grant agreement number 262067, funded by the European Union seventh Framework Programme; The Netherlands Proteomics Centre, embedded in The Netherlands Genomics Initiative; The Netherlands Bioinformatics Centre; and the Centre for Biomedical Genetics (to S.C., B.B. and A.J.R.H); by NIH grants NCRR RR001614 and RR019934 (to the UCSF Mass Spectrometry Facility, director: A.L. Burlingame, P.B.); and by grants from the MRC, CR-UK, BBSRC and Barts and the London Charity (to P.C.

    Development and validation of selective and sensitive LC-MS/MS Methods for determination of para-aminosalicyclic acid and cycloserine/terizidone applicable to clinical studies for the treatment of tuberculosis

    Get PDF
    A method was validated for the quantification of para-aminosalicylic acid (PAS) in human plasma. The technique consisted of a protein precipitation extraction, followed by high performance liquid chromatography with tandem mass spectrometry (LC-MS/MS) detection. Rilmenidine was used as the internal standard (ISTD). Analyte mean extraction yields determined were ~100.3% (CV % = 3.3). The extraction procedure was followed by liquid chromatographic separation using a Phenomenex Synergi Hydro-RP (150 x 2.0 mm, 4µm) analytical column. An isocratic mobile phase containing methanol, water and formic acid (40:59.8:0.2, v/v/v) was used at a flow-rate of 300 µl per minute. The retention times for PAS and rilmenidine were, ~2.4 and ~1.6 minutes, respectively. An AB Sciex API 3000 mass spectrometer at unit resolution in the multiple reaction monitoring (MRM) mode was used to monitor the transition of the protonated precursor ions m/z 154.1 and m/z 181.2 to the product ions m/z 80.2 and m/z 95.2 for PAS and the ISTD, respectively. Electro Spray Ionisation (ESI) was used for ion production. Accuracy and precision were assessed over three consecutive, independent runs. The calibration curve fits a quadratic (weighted by 1/x concentration) regression for PAS over the range 0.391 – 100 µg/ml, based on peak area ratios. A 1:1 and 1:4 dilution of the QC Dilution sample showed that concentrations of up to 160 µg/ml of PAS in plasma could be analysed reliably when diluted into the calibration range. Endogenous matrix components were found to have an insignificant effect on the reproducibility of the method, when human plasma originating from eight different sources were analysed. PAS was found to be stable in human plasma for 21 months kept at ~-80°C, for up to 21 hours at room temperature and when subjected to 3 freeze-thaw cycles. Stock solutions of PAS in methanol were stable for 2 days when stored at ~80°C and for 24 hours when stored at room temperature, ~4°C and ~-20°C. Plasma extracts of the analyte/ISTD ratio were shown to be stable on instrument over a period of ~55 hours. Reinjection reproducibility experiments indicated that an assay batch may be re-injected within 58 hours. Quantification of PAS in plasma was not significantly affected by the presence of haemolysed blood (2%) in plasma and when Lithium Heparin was used as anti-coagulant instead of K3EDTA. The best marker for terizidone pharmacokinetics is the analysis of cycloserine, a small polar drug with limited potential for absorbing UV that makes it difficult to analyse. A method was validated for the quantification of cycloserine in human plasma, and consisted of a protein precipitation extraction and derivatization, followed by high performance liquid chromatography with MS/MS detection. No ISTD was used as no suitable match could be found. The mean extraction yield determined was ~77% (CV% = 10.7). The extraction procedure was followed by liquid chromatographic separation using a Gemini NX C18 (50 x 2.0 mm, 5µ) analytical column. An isocratic mobile phase containing acetonitrile, water and formic acid (30:69.9:0.1, v/v/v) was used at a flow-rate of 300 µl per minute. The retention time for cycloserine was ~ 1.5 minutes. An AB Sciex API 3000 mass spectrometer at unit resolution in the MRM mode was used to monitor the transition of the protonated precursor ion m/z 335.9 to the product ion m/z 157.2 for cycloserine. ESI was used for ion production. Accuracy and precision were assessed over three consecutive, independent runs. The calibration curve fits a quadratic (weighted by 1/x concentration) regression for cycloserine over the range 0.313 – 40.0 µg/ml, based on peak areas. A 1:4 dilution of the QC Dilution sample showed that concentrations of up to 64.0 µg/ml of cycloserine in plasma could be analysed reliably when diluted into the calibration range and no carry over peaks were observed. Endogenous matrix components were found to have no effect on the reproducibility of the method when human plasma originating from six different sources was analysed. Cycloserine was found to be stable in human plasma for up to 18 hours at room temperature, and when subjected to 3 freeze-thaw cycles. Stock solutions of cycloserine in water and methanol were stable for 10 days when stored at ~ -80°C and for 18 hours when stored at room temperature, ~ 4°C and ~ -20°C. Long term stability in plasma has been proven for 17 months at -80°C. Plasma extracts of the analyte were shown to be stable on instrument over a period of ~ 29 hours. Reinjection reproducibility experiments indicate that an assay batch may be re-injected within 29 hours. Cycloserine is stable in whole blood (on ice) for up to 30 minutes. Both validated methods presented performed well on clinical samples generated from a multi drug resistant TB (MDR-TB) research study in children dosed with PAS and terizidone

    Quality Control Analysis in Real-time (QC-ART) : A Tool for Real-time Quality Control Assessment of Mass Spectrometry-based Proteomics Data

    Get PDF
    Liquid chromatography-mass spectrometry (LC-MS)-based proteomics studies of large sample cohorts can easily require from months to years to complete. Acquiring consistent, high-quality data in such large-scale studies is challenging because of normal variations in instrumentation performance over time, as well as artifacts introduced by the samples themselves, such as those because of collection, storage and processing. Existing quality control methods for proteomics data primarily focus on post-hoc analysis to remove low-quality data that would degrade downstream statistics; they are not designed to evaluate the data in near real-time, which would allow for interventions as soon as deviations in data quality are detected. In addition to flagging analyses that demonstrate outlier behavior, evaluating how the data structure changes over time can aide in understanding typical instrument performance or identify issues such as a degradation in data quality because of the need for instrument cleaning and/or re-calibration. To address this gap for proteomics, we developed Quality Control Analysis in Real-Time (QC-ART), a tool for evaluating data as they are acquired to dynamically flag potential issues with instrument performance or sample quality. QC-ART has similar accuracy as standard post-hoc analysis methods with the additional benefit of real-time analysis. We demonstrate the utility and performance of QC-ART in identifying deviations in data quality because of both instrument and sample issues in near real-time for LC-MS-based plasma proteomics analyses of a sample subset of The Environmental Determinants of Diabetes in the Young cohort. We also present a case where QC-ART facilitated the identification of oxidative modifications, which are often underappreciated in proteomic experiments.Peer reviewe
    corecore