13,655 research outputs found

    Mining whole sample mass spectrometry proteomics data for biomarkers: an overview

    No full text
    In this paper we aim to provide a concise overview of designing and conducting an MS proteomics experiment in such a way as to allow statistical analysis that may lead to the discovery of novel biomarkers. We provide a summary of the various stages that make up such an experiment, highlighting the need for experimental goals to be decided upon in advance. We discuss issues in experimental design at the sample collection stage, and good practise for standardising protocols within the proteomics laboratory. We then describe approaches to the data mining stage of the experiment, including the processing steps that transform a raw mass spectrum into a useable form. We propose a permutation-based procedure for determining the significance of reported error rates. Finally, because of its general advantages in speed and cost, we suggest that MS proteomics may be a good candidate for an early primary screening approach to disease diagnosis, identifying areas of risk and making referrals for more specific tests without necessarily making a diagnosis in its own right. Our discussion is illustrated with examples drawn from experiments on bovine blood serum conducted in the Centre for Proteomic Research (CPR) at Southampton University

    Robust Classification of Functional and Quantitative Image Data Using Functional Mixed Models

    Get PDF
    This paper describes how to perform classification of complex, high-dimensional functional data using the functional mixed model (FMM) framework. The FMM relates a functional response to a set of predictors through functional fixed and random effects, which allows it to account for various factors and between-function correlations. Classification is performed through training the model treating class as one of the fixed effects, and then predicting on the test data using posterior predictive probabilities of class. Through a Bayesian scheme, we are able to adjust for factors affecting both the functions and the class designations. While the method we present can be applied to any FMM-based method, we provide details for two specific Bayesian approaches: the Gaussian, wavelet-based functional mixed model (G-WFMM) and the robust, wavelet-based functional mixed model (R-WFMM). Both methods perform modeling in the wavelet space, which yields parsimonious representations for the functions, and can naturally adapt to local features and complex nonstationarities in the functions. The R-WFMM allows potentially heavier tails for features of the functions indexed by particular wavelet coefficients, leading to a down weighting of outliers that makes the method robust to outlying functions or regions of functions. The models are applied to a pancreatic cancer mass spectroscopy data set and compared with some other recently developed functional classification methods

    Plant Metabolomics Applications in the Brassicaceae: Added Value for Science and Industry

    Get PDF
    Crops from the family Brassicaceae represent a diverse and very interesting group of plants. In addition, their close relationship with the model plant, Arabidopsis thaliana, makes combined research on these species both scientifically valuable and of considerable commercial importance. In the post-genomics era, much effort is being placed on expanding our capacity to use advanced technologies such as proteomics and metabolomics, to broaden our knowledge of the molecular organization of plants and how genetic differences are translated into phenotypic ones. Metabolomics in particular is gaining much attention mainly due both to the comprehensiveness of the technology and also the potentially close relationship between biochemical composition (including human health-related phytochemicals) and phenotype. In this short review, a brief introduction to the main metabolomics technologies is given taking examples from research on the Brassicaceae for illustratio

    Matrix Factorization at Scale: a Comparison of Scientific Data Analytics in Spark and C+MPI Using Three Case Studies

    Full text link
    We explore the trade-offs of performing linear algebra using Apache Spark, compared to traditional C and MPI implementations on HPC platforms. Spark is designed for data analytics on cluster computing platforms with access to local disks and is optimized for data-parallel tasks. We examine three widely-used and important matrix factorizations: NMF (for physical plausability), PCA (for its ubiquity) and CX (for data interpretability). We apply these methods to TB-sized problems in particle physics, climate modeling and bioimaging. The data matrices are tall-and-skinny which enable the algorithms to map conveniently into Spark's data-parallel model. We perform scaling experiments on up to 1600 Cray XC40 nodes, describe the sources of slowdowns, and provide tuning guidance to obtain high performance

    Chemical fingerprinting of wood sampled along a pith-to-bark gradient for individual comparison and provenance identification

    Get PDF
    Background and Objectives: The origin of traded timber is one of the main questions in the enforcement of regulations to combat the illegal timber trade. Substantial efforts are still needed to develop techniques that can determine the exact geographical provenance of timber and this is vital to counteract the destructive effects of illegal logging, ranging from economical loss to habitat destruction. The potential of chemical fingerprints from pith-to-bark growth rings for individual comparison and geographical provenance determination is explored. Materials and Methods: A wood sliver was sampled per growth ring from four stem disks from four individuals of Pericopsis elata (Democratic Republic of the Congo) and from 14 stem disks from 14 individuals of Terminalia superba (Cote d'Ivoire and Democratic Republic of the Congo). Chemical fingerprints were obtained by analyzing these wood slivers with Direct Analysis in Real Time Time-Of-Flight Mass Spectrometry (DART TOFMS). Results: Individual distinction for both species was achieved but the accuracy was dependent on the dataset size and number of individuals included. As this is still experimental, we can only speak of individual comparison and not individual distinction at this point. The prediction accuracy for the country of origin increases with increasing sample number and a random sample can be placed in the correct country. When a complete disk is removed from the training dataset, its rings (samples) are correctly attributed to the country with an accuracy ranging from 43% to 100%. Relative abundances of ions appear to contribute more to differentiation compared to frequency differences. Conclusions: DART TOFMS shows potential for geographical provenancing but is still experimental for individual distinction; more research is needed to make this an established method. Sampling campaigns should focus on sampling tree cores from pith-to-bark, paving the way towards a chemical fingerprint database for species provenance

    Kernel methods in genomics and computational biology

    Full text link
    Support vector machines and kernel methods are increasingly popular in genomics and computational biology, due to their good performance in real-world applications and strong modularity that makes them suitable to a wide range of problems, from the classification of tumors to the automatic annotation of proteins. Their ability to work in high dimension, to process non-vectorial data, and the natural framework they provide to integrate heterogeneous data are particularly relevant to various problems arising in computational biology. In this chapter we survey some of the most prominent applications published so far, highlighting the particular developments in kernel methods triggered by problems in biology, and mention a few promising research directions likely to expand in the future

    Transfer Functions and Penetrations of Five Differential Mobility Analyzers for Sub-2 nm Particle Classification

    Get PDF
    The transfer functions and penetrations of five differential mobility analyzers (DMAs) for sub-2 nm particle classification were evaluated in this study. These DMAs include the TSI nanoDMA, the Caltech radial DMA (RDMA) and nanoRDMA, the Grimm nanoDMA, and the Karlsruhe-Vienna DMA. Measurements were done using tetra-alkyl ammonium ion standards with mobility diameters of 1.16, 1.47, and 1.70 nm. These monomobile ions were generated by electrospray followed by high resolution mobility classification. Measurements were focused at an aerosol-to-sheath flow ratio of 0.1. A data inversion routine was developed to obtain the true transfer function for each test DMA, and these measured transfer functions were compared with theory. DMA penetration efficiencies were also measured. An approximate model for diffusional deposition, based on the modified Gormley and Kennedy equation using an effective length, is given for each test DMA. These results quantitatively characterize the performance of the test DMAs in classifying sub-2 nm particles and can be readily used for DMA data inversion

    Mass spectrometry protein expression profiles in colorectal cancer tissue associated with clinico-pathological features of disease

    Get PDF
    Background: Studies of several tumour types have shown that expression profiling of cellular protein extracted from surgical tissue specimens by direct mass spectrometry analysis can accurately discriminate tumour from normal tissue and in some cases can sub-classify disease. We have evaluated the potential value of this approach to classify various clinico-pathological features in colorectal cancer by employing matrix-assisted laser desorption ionisation time of-flight-mass spectrometry (MALDI-TOF MS). Methods: Protein extracts from 31 tumour and 33 normal mucosa specimens were purified, subjected to MALDI-Tof MS and then analysed using the `GenePattern' suite of computational tools (Broad Institute, MIT, USA). Comparative Gene Marker Selection with either a t-test or a signal-to-noise ratio (SNR) test statistic was used to identify and rank differentially expressed marker peaks. The k-nearest neighbours algorithm was used to build classification models either using separate training and test datasets or else by using an iterative, `leave-one-out' cross-validation method. Results: 73 protein peaks in the mass range 1800-16000Da were differentially expressed in tumour verses adjacent normal mucosa tissue (P <= 0.01, false discovery rate <= 0.05). Unsupervised hierarchical cluster analysis classified most tumour and normal mucosa into distinct cluster groups. Supervised prediction correctly classified the tumour/normal mucosa status of specimens in an independent test spectra dataset with 100\% sensitivity and specificity (95\% confidence interval: 67.9-99.2\%). Supervised prediction using `leave-one-out' cross validation algorithms for tumour spectra correctly classified 10/13 poorly differentiated and 16/18 well/moderately differentiated tumours (P = < 0.001; receiver-operator characteristics - ROC - error, 0.171); disease recurrence was correctly predicted in 5/6 cases and disease-free survival (median follow-up time, 25 months) was correctly predicted in 22/23 cases (P = < 0.001; ROC error, 0.105). A similar analysis of normal mucosa spectra correctly predicted 11/14 patients with, and 15/19 patients without lymph node involvement (P = 0.001; ROC error, 0.212). Conclusions: Protein expression profiling of surgically resected CRC tissue extracts by MALDI-TOF MS has potential value in studies aimed at improved molecular classification of this disease. Further studies, with longer follow-up times and larger patient cohorts, that would permit independent validation of supervised classification models, would be required to confirm the predictive value of tumour spectra for disease recurrence/patient survival
    corecore