871 research outputs found

    A metaproteomic approach to study human-microbial ecosystems at the mucosal luminal interface

    Get PDF
    Aberrant interactions between the host and the intestinal bacteria are thought to contribute to the pathogenesis of many digestive diseases. However, studying the complex ecosystem at the human mucosal-luminal interface (MLI) is challenging and requires an integrative systems biology approach. Therefore, we developed a novel method integrating lavage sampling of the human mucosal surface, high-throughput proteomics, and a unique suite of bioinformatic and statistical analyses. Shotgun proteomic analysis of secreted proteins recovered from the MLI confirmed the presence of both human and bacterial components. To profile the MLI metaproteome, we collected 205 mucosal lavage samples from 38 healthy subjects, and subjected them to high-throughput proteomics. The spectral data were subjected to a rigorous data processing pipeline to optimize suitability for quantitation and analysis, and then were evaluated using a set of biostatistical tools. Compared to the mucosal transcriptome, the MLI metaproteome was enriched for extracellular proteins involved in response to stimulus and immune system processes. Analysis of the metaproteome revealed significant individual-related as well as anatomic region-related (biogeographic) features. Quantitative shotgun proteomics established the identity and confirmed the biogeographic association of 49 proteins (including 3 functional protein networks) demarcating the proximal and distal colon. This robust and integrated proteomic approach is thus effective for identifying functional features of the human mucosal ecosystem, and a fresh understanding of the basic biology and disease processes at the MLI. © 2011 Li et al

    New Statistical Algorithms for the Analysis of Mass Spectrometry Time-Of-Flight Mass Data with Applications in Clinical Diagnostics

    Get PDF
    Mass spectrometry (MS) based techniques have emerged as a standard forlarge-scale protein analysis. The ongoing progress in terms of more sensitive machines and improved data analysis algorithms led to a constant expansion of its fields of applications. Recently, MS was introduced into clinical proteomics with the prospect of early disease detection using proteomic pattern matching. Analyzing biological samples (e.g. blood) by mass spectrometry generates mass spectra that represent the components (molecules) contained in a sample as masses and their respective relative concentrations. In this work, we are interested in those components that are constant within a group of individuals but differ much between individuals of two distinct groups. These distinguishing components that dependent on a particular medical condition are generally called biomarkers. Since not all biomarkers found by the algorithms are of equal (discriminating) quality we are only interested in a small biomarker subset that - as a combination - can be used as a fingerprint for a disease. Once a fingerprint for a particular disease (or medical condition) is identified, it can be used in clinical diagnostics to classify unknown spectra. In this thesis we have developed new algorithms for automatic extraction of disease specific fingerprints from mass spectrometry data. Special emphasis has been put on designing highly sensitive methods with respect to signal detection. Thanks to our statistically based approach our methods are able to detect signals even below the noise level inherent in data acquired by common MS machines, such as hormones. To provide access to these new classes of algorithms to collaborating groups we have created a web-based analysis platform that provides all necessary interfaces for data transfer, data analysis and result inspection. To prove the platform's practical relevance it has been utilized in several clinical studies two of which are presented in this thesis. In these studies it could be shown that our platform is superior to commercial systems with respect to fingerprint identification. As an outcome of these studies several fingerprints for different cancer types (bladder, kidney, testicle, pancreas, colon and thyroid) have been detected and validated. The clinical partners in fact emphasize that these results would be impossible with a less sensitive analysis tool (such as the currently available systems). In addition to the issue of reliably finding and handling signals in noise we faced the problem to handle very large amounts of data, since an average dataset of an individual is about 2.5 Gigabytes in size and we have data of hundreds to thousands of persons. To cope with these large datasets, we developed a new framework for a heterogeneous (quasi) ad-hoc Grid - an infrastructure that allows to integrate thousands of computing resources (e.g. Desktop Computers, Computing Clusters or specialized hardware, such as IBM's Cell Processor in a Playstation 3)

    Visualization of proteomics data using R and bioconductor.

    Get PDF
    Data visualization plays a key role in high-throughput biology. It is an essential tool for data exploration allowing to shed light on data structure and patterns of interest. Visualization is also of paramount importance as a form of communicating data to a broad audience. Here, we provided a short overview of the application of the R software to the visualization of proteomics data. We present a summary of R's plotting systems and how they are used to visualize and understand raw and processed MS-based proteomics data.LG was supported by the European Union 7th Framework Program (PRIME-XS project, grant agreement number 262067) and a BBSRC Strategic Longer and Larger grant (Award BB/L002817/1). LMB was supported by a BBSRC Tools and Resources Development Fund (Award BB/K00137X/1). TN was supported by a ERASMUS Placement scholarship.This is the final published version of the article. It was originally published in Proteomics (PROTEOMICS Special Issue: Proteomics Data Visualisation Volume 15, Issue 8, pages 1375–1389, April 2015. DOI: 10.1002/pmic.201400392). The final version is available at http://onlinelibrary.wiley.com/doi/10.1002/pmic.201400392/abstract

    Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data

    Full text link
    Image data are increasingly encountered and are of growing importance in many areas of science. Much of these data are quantitative image data, which are characterized by intensities that represent some measurement of interest in the scanned images. The data typically consist of multiple images on the same domain and the goal of the research is to combine the quantitative information across images to make inference about populations or interventions. In this paper we present a unified analysis framework for the analysis of quantitative image data using a Bayesian functional mixed model approach. This framework is flexible enough to handle complex, irregular images with many local features, and can model the simultaneous effects of multiple factors on the image intensities and account for the correlation between images induced by the design. We introduce a general isomorphic modeling approach to fitting the functional mixed model, of which the wavelet-based functional mixed model is one special case. With suitable modeling choices, this approach leads to efficient calculations and can result in flexible modeling and adaptive smoothing of the salient features in the data. The proposed method has the following advantages: it can be run automatically, it produces inferential plots indicating which regions of the image are associated with each factor, it simultaneously considers the practical and statistical significance of findings, and it controls the false discovery rate.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS407 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Review and Evaluation of Techniques for Improved Feature Detection in Mass Spectrometry Data

    Get PDF
    Mass spectrometry (MS) is used in analysis of chemical samples to identify the molecules present and their quantities. This analytical technique has applications in many fields, from pharmacology to space exploration. Its impacts on medicine are particularly significant, since MS aids in the identification of molecules associated with disease; for instance, in proteomics, MS allows researchers to identify proteins that are associated with autoimmune disorders, cancers, and other conditions. Since the applications are so wide-ranging and the tool is ubiquitous across so many fields, it is critical that the analytical methods used to collect data are sound. Data analysis in MS is challenging. Experiments produce massive amounts of raw data that need to be processed algorithmically in order to generate interpretable results in a process known as feature detection, which is tasked with distinguishing signals associated with the chemical sample being analyzed from signals associated with background noise. These experimentally meaningful signals are also known as features or extracted ion chromatograms (XIC) and are the fundamental signal unit in mass spectrometry. There are many algorithms for analyzing raw mass spectrometry data tasked with distinguishing real isotopic signals from noise. While one or more of the available algorithms are typically chained together for end-to-end mass spectrometry analysis, analysis of each algorithm in isolation provides a specific measurement of the strengths and weaknesses of each algorithm without the confounding effects that can occur when multiple algorithmic tasks are chained together. Though qualitative opinions on extraction algorithm performance abound, quantitative performance has never been publicly ascertained. Quantitative evaluation has not occurred partly due to the lack of an available quantitative ground truth MS1 data set. Because XIC must be distinguished from noise, quality algorithms for this purpose are essential. Background noise is introduced through the mobile phase of the chemical matrix in which the sample of interest is introduced to the MS instrument, and as a result, MS data is full of signals representing low-abundance molecules (i.e. low-intensity signals). Noise generally presents in one of two ways: very low-intensity signals that comprise a majority of the data from an MS experiment, and noise features that are moderately low-intensity and can resemble signals from low-abundance molecules deriving from the actual sample of interest. Like XIC algorithms, noise reduction algorithms have yet to be quantitatively evaluated, to our knowledge; the performance of these algorithms is generally evaluated through consensus with other noise reduction algorithms. Using a recently published, manually-extracted XIC dataset as ground truth data, we evaluate the quality of popular XIC algorithms, including MaxQuant, MZMine2, and several methods from XCMS. XIC algorithms were applied to the manually extracted data using a grid search of possible parameters. Performance varied greatly between different parameter settings, though nearly all algorithms with parameter settings optimized with respect to the number of true positives recovered over 10,000 XIC. We also examine two popular algorithms for reducing background noise, the COmponent Detection Algorithm (CODA) and adaptive iteratively reweighted Penalized Least Squares (airPLS), and compare their performance to the results of feature detection alone using algorithms that achieved the best performance in a previous evaluation. Due to weaknesses inherent in the implementation of these algorithms, both noise reduction algorithms eliminate data identified by feature detection as significant
    corecore