871 research outputs found
A metaproteomic approach to study human-microbial ecosystems at the mucosal luminal interface
Aberrant interactions between the host and the intestinal bacteria are thought to contribute to the pathogenesis of many digestive diseases. However, studying the complex ecosystem at the human mucosal-luminal interface (MLI) is challenging and requires an integrative systems biology approach. Therefore, we developed a novel method integrating lavage sampling of the human mucosal surface, high-throughput proteomics, and a unique suite of bioinformatic and statistical analyses. Shotgun proteomic analysis of secreted proteins recovered from the MLI confirmed the presence of both human and bacterial components. To profile the MLI metaproteome, we collected 205 mucosal lavage samples from 38 healthy subjects, and subjected them to high-throughput proteomics. The spectral data were subjected to a rigorous data processing pipeline to optimize suitability for quantitation and analysis, and then were evaluated using a set of biostatistical tools. Compared to the mucosal transcriptome, the MLI metaproteome was enriched for extracellular proteins involved in response to stimulus and immune system processes. Analysis of the metaproteome revealed significant individual-related as well as anatomic region-related (biogeographic) features. Quantitative shotgun proteomics established the identity and confirmed the biogeographic association of 49 proteins (including 3 functional protein networks) demarcating the proximal and distal colon. This robust and integrated proteomic approach is thus effective for identifying functional features of the human mucosal ecosystem, and a fresh understanding of the basic biology and disease processes at the MLI. © 2011 Li et al
New Statistical Algorithms for the Analysis of Mass Spectrometry Time-Of-Flight Mass Data with Applications in Clinical Diagnostics
Mass spectrometry (MS) based techniques have emerged as a standard forlarge-scale protein analysis. The ongoing progress in terms of more sensitive
machines and improved data analysis algorithms led to a constant expansion of
its fields of applications. Recently, MS was introduced into clinical proteomics
with the prospect of early disease detection using proteomic pattern matching.
Analyzing biological samples (e.g. blood) by mass spectrometry generates
mass spectra that represent the components (molecules) contained in a
sample as masses and their respective relative concentrations.
In this work, we are interested in those components that are constant within a
group of individuals but differ much between individuals of two distinct groups.
These distinguishing components that dependent on a particular medical condition
are generally called biomarkers. Since not all biomarkers found by the
algorithms are of equal (discriminating) quality we are only interested in a
small biomarker subset that - as a combination - can be used as a
fingerprint for a disease. Once a fingerprint for a particular disease
(or medical condition) is identified, it can be used in clinical diagnostics to
classify unknown spectra.
In this thesis we have developed new algorithms for automatic extraction of
disease specific fingerprints from mass spectrometry data. Special emphasis has
been put on designing highly sensitive methods with respect to signal detection.
Thanks to our statistically based approach our methods are able to
detect signals even below the noise level inherent in data acquired by common MS
machines, such as hormones.
To provide access to these new classes of algorithms to collaborating groups
we have created a web-based analysis platform that provides all necessary
interfaces for data transfer, data analysis and result inspection.
To prove the platform's practical relevance it has been utilized in several
clinical studies two of which are presented in this thesis. In these studies it
could be shown that our platform is superior to commercial systems with respect
to fingerprint identification. As an outcome of these studies several
fingerprints for different cancer types (bladder, kidney, testicle, pancreas,
colon and thyroid) have been detected and validated. The clinical partners in
fact emphasize that these results would be impossible with a less sensitive
analysis tool (such as the currently available systems).
In addition to the issue of reliably finding and handling signals in noise we
faced the problem to handle very large amounts of data, since an average dataset
of an individual is about 2.5 Gigabytes in size and we have data of hundreds to
thousands of persons. To cope with these large datasets, we developed a new
framework for a heterogeneous (quasi) ad-hoc Grid - an infrastructure that
allows to integrate thousands of computing resources (e.g. Desktop Computers,
Computing Clusters or specialized hardware, such as IBM's Cell Processor in a
Playstation 3)
Visualization of proteomics data using R and bioconductor.
Data visualization plays a key role in high-throughput biology. It is an essential tool for data exploration allowing to shed light on data structure and patterns of interest. Visualization is also of paramount importance as a form of communicating data to a broad audience. Here, we provided a short overview of the application of the R software to the visualization of proteomics data. We present a summary of R's plotting systems and how they are used to visualize and understand raw and processed MS-based proteomics data.LG was supported by the
European Union 7th Framework Program (PRIME-XS project,
grant agreement number 262067) and a BBSRC Strategic Longer
and Larger grant (Award BB/L002817/1). LMB was supported
by a BBSRC Tools and Resources Development Fund (Award
BB/K00137X/1). TN was supported by a ERASMUS Placement
scholarship.This is the final published version of the article. It was originally published in Proteomics (PROTEOMICS Special Issue: Proteomics Data Visualisation Volume 15, Issue 8, pages 1375–1389, April 2015. DOI: 10.1002/pmic.201400392). The final version is available at http://onlinelibrary.wiley.com/doi/10.1002/pmic.201400392/abstract
Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data
Image data are increasingly encountered and are of growing importance in many
areas of science. Much of these data are quantitative image data, which are
characterized by intensities that represent some measurement of interest in the
scanned images. The data typically consist of multiple images on the same
domain and the goal of the research is to combine the quantitative information
across images to make inference about populations or interventions. In this
paper we present a unified analysis framework for the analysis of quantitative
image data using a Bayesian functional mixed model approach. This framework is
flexible enough to handle complex, irregular images with many local features,
and can model the simultaneous effects of multiple factors on the image
intensities and account for the correlation between images induced by the
design. We introduce a general isomorphic modeling approach to fitting the
functional mixed model, of which the wavelet-based functional mixed model is
one special case. With suitable modeling choices, this approach leads to
efficient calculations and can result in flexible modeling and adaptive
smoothing of the salient features in the data. The proposed method has the
following advantages: it can be run automatically, it produces inferential
plots indicating which regions of the image are associated with each factor, it
simultaneously considers the practical and statistical significance of
findings, and it controls the false discovery rate.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS407 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A Review and Evaluation of Techniques for Improved Feature Detection in Mass Spectrometry Data
Mass spectrometry (MS) is used in analysis of chemical samples to identify the molecules present and their quantities. This analytical technique has applications in many fields, from pharmacology to space exploration. Its impacts on medicine are particularly significant, since MS aids in the identification of molecules associated with disease; for instance, in proteomics, MS allows researchers to identify proteins that are associated with autoimmune disorders, cancers, and other conditions. Since the applications are so wide-ranging and the tool is ubiquitous across so many fields, it is critical that the analytical methods used to collect data are sound.
Data analysis in MS is challenging. Experiments produce massive amounts of raw data that need to be processed algorithmically in order to generate interpretable results in a process known as feature detection, which is tasked with distinguishing signals associated with the chemical sample being analyzed from signals associated with background noise. These experimentally meaningful signals are also known as features or extracted ion chromatograms (XIC) and are the fundamental signal unit in mass spectrometry. There are many algorithms for analyzing raw mass spectrometry data tasked with distinguishing real isotopic signals from noise. While one or more of the available algorithms are typically chained together for end-to-end mass spectrometry analysis, analysis of each algorithm in isolation provides a specific measurement of the strengths and weaknesses of each algorithm without the confounding effects that can occur when multiple algorithmic tasks are chained together. Though qualitative opinions on extraction algorithm performance abound, quantitative performance has never been publicly ascertained. Quantitative evaluation has not occurred partly due to the lack of an available quantitative ground truth MS1 data set.
Because XIC must be distinguished from noise, quality algorithms for this purpose are essential. Background noise is introduced through the mobile phase of the chemical matrix in which the sample of interest is introduced to the MS instrument, and as a result, MS data is full of signals representing low-abundance molecules (i.e. low-intensity signals). Noise generally presents in one of two ways: very low-intensity signals that comprise a majority of the data from an MS experiment, and noise features that are moderately low-intensity and can resemble signals from low-abundance molecules deriving from the actual sample of interest. Like XIC algorithms, noise reduction algorithms have yet to be quantitatively evaluated, to our knowledge; the performance of these algorithms is generally evaluated through consensus with other noise reduction algorithms.
Using a recently published, manually-extracted XIC dataset as ground truth data, we evaluate the quality of popular XIC algorithms, including MaxQuant, MZMine2, and several methods from XCMS. XIC algorithms were applied to the manually extracted data using a grid search of possible parameters. Performance varied greatly between different parameter settings, though nearly all algorithms with parameter settings optimized with respect to the number of true positives recovered over 10,000 XIC. We also examine two popular algorithms for reducing background noise, the COmponent Detection Algorithm (CODA) and adaptive iteratively reweighted Penalized Least Squares (airPLS), and compare their performance to the results of feature detection alone using algorithms that achieved the best performance in a previous evaluation. Due to weaknesses inherent in the implementation of these algorithms, both noise reduction algorithms eliminate data identified by feature detection as significant
- …