3,628 research outputs found

    Evaluation of peak-picking algorithms for protein mass spectrometry

    Get PDF
    Peak picking is an early key step in MS data analysis. We compare three commonly used approaches to peak picking and discuss their merits by means of statistical analysis. Methods investigated encompass signal-to-noise ratio, continuous wavelet transform, and a correlation-based approach using a Gaussian template. Functionality of the three methods is illustrated and discussed in a practical context using a mass spectral data set created with MALDI-TOF technology. Sensitivity and specificity are investigated using a manually defined reference set of peaks. As an additional criterion, the robustness of the three methods is assessed by a perturbation analysis and illustrated using ROC curves

    Statistical Methods in Metabolomics

    Get PDF
    Metabolomics lies at the fulcrum of the system biology ‘omics’. Metabolic profiling offers researchers new insight into genetic and environmental interactions, responses to pathophysi- ological stimuli and novel biomarker discovery. Metabolomics lacks the simplicity of a single data capturing technique; instead, increasingly sophisticated multivariate statistical techniques are required to tease out useful metabolic features from various complex datasets. In this work, two major metabolomics methods are examined: Nuclear Magnetic Resonance (NMR) Spec- troscopy and Liquid Chromatography-Mass Spectrometry (LC-MS). MetAssimulo, an 1H-NMR metabolic-profile simulator, was developed in part by this author and is described in the Chap- ter 2. Peak positional variation is a phenomenon occurring in NMR spectra that complicates metabolomic analysis so Chapter 3 focuses on modelling the effect of pH on peak position. Analysis of LC-MS data is somewhat more complex given its 2-D structure, so I review existing pre-processing and feature detection techniques in Chapter 4 and then attempt to tackle the issue from a Bayesian viewpoint. A Bayesian Partition Model is developed to distinguish chro- matographic peaks representing useful features from chemical and instrumental interference and noise. Another of the LC-MS pre-processing problems, data binning, is also explored as part of H-MS: a pre-processing algorithm incorporating wavelet smoothing and novel Gaussian and Exponentially Modified Gaussian peak detection. The performance of H-MS is compared alongside two existing pre-processing packages: apLC-MS and XCMS.Open Acces

    Biomarker discovery and redundancy reduction towards classification using a multi-factorial MALDI-TOF MS T2DM mouse model dataset

    Get PDF
    Diabetes like many diseases and biological processes is not mono-causal. On the one hand multifactorial studies with complex experimental design are required for its comprehensive analysis. On the other hand, the data from these studies often include a substantial amount of redundancy such as proteins that are typically represented by a multitude of peptides. Coping simultaneously with both complexities (experimental and technological) makes data analysis a challenge for Bioinformatics

    Nonlinear preprocessing method for detecting peaks from gas chromatograms

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The problem of locating valid peaks from data corrupted by noise frequently arises while analyzing experimental data. In various biological and chemical data analysis tasks, peak detection thus constitutes a critical preprocessing step that greatly affects downstream analysis and eventual quality of experiments. Many existing techniques require the users to adjust parameters by trial and error, which is error-prone, time-consuming and often leads to incorrect analysis results. Worse, conventional approaches tend to report an excessive number of false alarms by finding fictitious peaks generated by mere noise.</p> <p>Results</p> <p>We have designed a novel peak detection method that can significantly reduce parameter sensitivity, yet providing excellent peak detection performance and negligible false alarm rates from gas chromatographic data. The key feature of our new algorithm is the successive use of peak enhancement algorithms that are deliberately designed for a gradual improvement of peak detection quality. We tested our approach with real gas chromatograms as well as intentionally contaminated spectra that contain Gaussian or speckle-type noise.</p> <p>Conclusion</p> <p>Our results demonstrate that the proposed method can achieve near perfect peak detection performance while maintaining very small false alarm probabilities in case of gas chromatograms. Given the fact that biological signals appear in the form of peaks in various experimental data and that the propose method can easily be extended to such data, our approach will be a useful and robust tool that can help researchers highlight valid signals in their noisy measurements.</p

    Feature Detection Techniques for Preprocessing Proteomic Data

    Get PDF
    Numerous gel-based and nongel-based technologies are used to detect protein changes potentially associated with disease. The raw data, however, are abundant with technical and structural complexities, making statistical analysis a difficult task. Low-level analysis issues (including normalization, background correction, gel and/or spectral alignment, feature detection, and image registration) are substantial problems that need to be addressed, because any large-level data analyses are contingent on appropriate and statistically sound low-level procedures. Feature detection approaches are particularly interesting due to the increased computational speed associated with subsequent calculations. Such summary data corresponding to image features provide a significant reduction in overall data size and structure while retaining key information. In this paper, we focus on recent advances in feature detection as a tool for preprocessing proteomic data. This work highlights existing and newly developed feature detection algorithms for proteomic datasets, particularly relating to time-of-flight mass spectrometry, and two-dimensional gel electrophoresis. Note, however, that the associated data structures (i.e., spectral data, and images containing spots) used as input for these methods are obtained via all gel-based and nongel-based methods discussed in this manuscript, and thus the discussed methods are likewise applicable

    Automated cropping intensity extraction from isolines of wavelet spectra

    Get PDF
    Timely and accurate monitoring of cropping intensity (CI) is essential to help us understand changes in food production. This paper aims to develop an automatic Cropping Intensity extraction method based on the Isolines of Wavelet Spectra (CIIWS) with consideration of intra- class variability. The CIIWS method involves the following procedures: (1) characterizing vegetation dynamics from time–frequency dimensions through a continuous wavelet transform performed on vegetation index temporal profiles; (2) deriving three main features, the skeleton width, maximum number of strong brightness centers and the intersection of their scale intervals, through computing a series of wavelet isolines from the wavelet spectra; and (3) developing an automatic cropping intensity classifier based on these three features. The proposed CIIWS method improves the understanding in the spectral–temporal properties of vegetation dynamic processes. To test its efficiency, the CIIWS method is applied to China’s Henan province using 250 m 8 days composite Moderate Resolution Imaging Spectroradiometer (MODIS) Enhanced Vegetation Index (EVI) time series datasets. An overall accuracy of 88.9% is achieved when compared with in-situ observation data. The mapping result is also evaluated with 30 m Chinese Environmental Disaster Reduction Satellite (HJ-1)-derived data and an overall accuracy of 86.7% is obtained. At county level, the MODIS-derived sown areas and agricultural statistical data are well correlated (r2 = 0.85). The merit and uniqueness of the CIIWS method is the ability to cope with the complex intra-class variability through continuous wavelet transform and efficient feature extraction based on wavelet isolines. As an objective and meaningful algorithm, it guarantees easy applications and greatly contributes to satellite observations of vegetation dynamics and food security efforts

    DATA ANALYSIS WORKFLOW FOR GAS CHROMATOGRAPHY MASS SPECTROMETRY-BASED METABOLOMICS STUDIES

    Get PDF
    Metabolomics has emerged as an integral part of systems biology research that attempts to comprehensively study low molecular weight organic and inorganic metabolites under certain conditions within a biological system. Technological advances in the past decade have made it possible to carry out metabolomics studies in a high- throughput fashion using gas chromatography coupled with mass spectrometry. As a result, large volumes of data are produced from these studies and there is a pressing need for algorithms that can efficiently process and analyze the data in a high-throughput fashion as well. To address this need, we have developed computational algorithms and the associated software tool named an Automated Data Analysis Pipeline (ADAP). ADAP allows data to flow seamlessly through the data processing steps that include de- nosing, peak detection, deconvolution, alignment, compound identification and quantitation. The development of ADAP started in 2009 and the past four years have witnessed continuous improvements in its performance from ADAP-GC 1.0, to ADAP- GC 2.0, and to the current ADAP-GC 3.0. As part of the performance assessment of ADAP-GC, we have compared it with three other software tools. In this dissertation, I will present the computational details about these three versions of ADAP-GC, the capabilities of the software tool, and the results from software comparison
    corecore