Search CORE

arXiv.org e-Print Archive

Liquid Chromatography Mass Spectrometry-Based Proteomics: Biological and Technological Aspects

Author: Alan R. Dabney
Ashoka D. Polpitiya
Gordon A. Anderson
Richard D. Smith
Yuliya V. Karpievitch
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

Mass spectrometry-based proteomics has become the tool of choice for identifying and quantifying the proteome of an organism. Though recent years have seen a tremendous improvement in instrument performance and the computational tools used, significant challenges remain, and there are many opportunities for statisticians to make important contributions. In the most widely used "bottom-up" approach to proteomics, complex mixtures of proteins are first subjected to enzymatic cleavage, the resulting peptide products are separated based on chemical or physical properties and analyzed using a mass spectrometer. The two fundamental challenges in the analysis of bottom-up MS-based proteomics are as follows: (1) Identifying the proteins that are present in a sample, and (2) Quantifying the abundance levels of the identified proteins. Both of these challenges require knowledge of the biological and technological context that gives rise to observed data, as well as the application of sound statistical principles for estimation and inference. We present an overview of bottom-up proteomics and outline the key statistical issues that arise in protein identification and quantification.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS341 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

Normalization and missing value imputation for label-free LC-MS analysis

Author: Dabney Alan R
Karpievitch Yuliya V
Smith Richard D
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Shotgun proteomic data are affected by a variety of known and unknown systematic biases as well as high proportions of missing values. Typically, normalization is performed in an attempt to remove systematic biases from the data before statistical inference, sometimes followed by missing value imputation to obtain a complete matrix of intensities. Here we discuss several approaches to normalization and dealing with missing values, some initially developed for microarray data and some developed specifically for mass spectrometry-based data

Springer - Publisher Connector

Public Library of Science (PLOS)

Optimality Driven Nearest Centroid Classification from Genomic Data

Author: A Alizadeh
Alan R. Dabney
AR Dabney
AR Dabney
B Efron
C Ambroise
C Stein
D Ross
I Hedenfalk
J Khan
J Schäfer
Ji Zhu
John D. Storey
JW Lee
K Mardia
P Bickel
R Shen
R Tibshirani
RJ McKay
RJ McKay
S Dudoit
T Golub
TH Bø
Y Guo
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers

CiteSeerX

Diet Complexity and Estrogen Receptor β Status Affect the Composition of the Murine Intestinal Microbiota

Author: Alan Dabney
Clinton D. Allred
Joseph M. Sturino
Laura N. Thomas
M. Andrea Azcarate-Peril
Rani Menon
Sara E. Watson
Publication venue: 'American Society for Microbiology'
Publication date: 01/01/2013
Field of study

ABSTRACT Intestinal microbial dysbiosis contributes to the dysmetabolism of luminal factors, including steroid hormones (sterones) that affect the development of chronic gastrointestinal inflammation and the incidence of sterone-responsive cancers of the breast, prostate, and colon. Little is known, however, about the role of specific host sterone nucleoreceptors, including estrogen receptor β (ERβ), in microbiota maintenance. Herein, we test the hypothesis that ERβ status affects microbiota composition and determine if such compositionally distinct microbiota respond differently to changes in diet complexity that favor Proteobacteria enrichment. To this end, conventionally raised female ERβ +/+ and ERβ −/− C57BL/6J mice (mean age of 27 weeks) were initially reared on 8604, a complex diet containing estrogenic isoflavones, and then fed AIN-76, an isoflavone-free semisynthetic diet, for 2 weeks. 16S rRNA gene surveys revealed that the fecal microbiota of 8604-fed mice and AIN-76-fed mice differed, as expected. The relative diversity of Proteobacteria , especially the Alphaproteobacteria and Gammaproteobacteria , increased significantly following the transition to AIN-76. Distinct patterns for beneficial Lactobacillales were exclusive to and highly abundant among 8604-fed mice, whereas several Proteobacteria were exclusive to AIN-76-fed mice. Interestingly, representative orders of the phyla Proteobacteria , Bacteroidetes , and Firmicutes , including the Lactobacillales , also differed as a function of murine ERβ status. Overall, these interactions suggest that sterone nucleoreceptor status and diet complexity may play important roles in microbiota maintenance. Furthermore, we envision that this model for gastrointestinal dysbiosis may be used to identify novel probiotics, prebiotics, nutritional strategies, and pharmaceuticals for the prevention and resolution of Proteobacteria -rich dysbiosis

Carolina Digital Repository

An Introspective Comparison of Random Forest-Based Classifiers for the Analysis of Cluster-Correlated Data by Way of RF++

Author: A Vlahou
Alan R. Dabney
Anthony P. Leclerc
AR Dabney
B Efron
B Rosner
B Wu
BL Adam
C Strobl
C Strobl
D Agranoff
DS Palmer
EF Petricoin
EJ Finehout
Elizabeth G. Hill
ET Fung
Fabio Rapallo
G Izmirlian
GA Churchill
H Zhang
JM Koomen
Jonas S. Almeida
JR Quinlan
JS Morris
L Breiman
L Breiman
L Breiman
L Li
LE Breiman
M Hilario
MR Segal
PJ Adam
RW Garden
S Schaub
SK Lee
TM Pawlik
TP Conrads
V Svetnik
Y Yasui
YD Chen
Yuliya V. Karpievitch
YV Karpievitch
YV Karpievitch
Publication venue: Public Library of Science
Publication date: 01/01/2009
Field of study

Many mass spectrometry-based studies, as well as other biological experiments produce cluster-correlated data. Failure to account for correlation among observations may result in a classification algorithm overfitting the training data and producing overoptimistic estimated error rates and may make subsequent classifications unreliable. Current common practice for dealing with replicated data is to average each subject replicate sample set, reducing the dataset size and incurring loss of information. In this manuscript we compare three approaches to dealing with cluster-correlated data: unmodified Breiman's Random Forest (URF), forest grown using subject-level averages (SLA), and RF++ with subject-level bootstrapping (SLB). RF++, a novel Random Forest-based algorithm implemented in C++, handles cluster-correlated data through a modification of the original resampling algorithm and accommodates subject-level classification. Subject-level bootstrapping is an alternative sampling method that obviates the need to average or otherwise reduce each set of replicates to a single independent sample. Our experiments show nearly identical median classification and variable selection accuracy for SLB forests and URF forests when applied to both simulated and real datasets. However, the run-time estimated error rate was severely underestimated for URF forests. Predictably, SLA forests were found to be more severely affected by the reduction in sample size which led to poorer classification and variable selection accuracy. Perhaps most importantly our results suggest that it is reasonable to utilize URF for the analysis of cluster-correlated data. Two caveats should be noted: first, correct classification error rates must be obtained using a separate test dataset, and second, an additional post-processing step is required to obtain subject-level classifications. RF++ is shown to be an effective alternative for classifying both clustered and non-clustered data. Source code and stand-alone compiled versions of command-line and easy-to-use graphical user interface (GUI) versions of RF++ for Windows and Linux as well as a user manual (Supplementary File S2) are available for download at: http://sourceforge.org/projects/rfpp/ under the GNU public license

Public Library of Science (PLOS)

Directory of Open Access Journals

Public Library of Science (PLOS)

Genome wide association mapping for arabinoxylan content in a collection of tetraploid wheats

Author: A Lovegrove
Agata Gadaleta
Antonio Blanco
CM Courtin
E Akhunov
E Sears
G Charmet
G Laido
Geoffrey B. Fincher
HM Collins
I Lempereur
Ilaria Marcotuli
J Crossa
J Xiao
JDS Alan Dabney
JP Martinant
K Tamura
Kelly Houston
KTC Zondervan
L Saulnier
MS Izydorczyk
P Colasuonno
Pilar Hernandez
PK Gupta
PM Coutinho
R Burton
R Ciccoritti
R Shewry P
RAC Mitchell
Rachel A. Burton
Robbie Waugh
SW Wang
TK Pellny
TK Pellny
UM Quraishi
VL Nguyen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

BACKGROUND: Arabinoxylans (AXs) are major components of plant cell walls in bread wheat and are important in bread-making and starch extraction. Furthermore, arabinoxylans are components of soluble dietary fibre that has potential health-promoting effects in human nutrition. Despite their high value for human health, few studies have been carried out on the genetics of AX content in durum wheat. RESULTS: The genetic variability of AX content was investigated in a set of 104 tetraploid wheat genotypes and regions attributable to AX content were identified through a genome wide association study (GWAS). The amount of arabinoxylan, expressed as percentage (w/w) of the dry weight of the kernel, ranged from 1.8% to 5.5% with a mean value of 4.0%. The GWAS revealed a total of 37 significant marker-trait associations (MTA), identifying 19 quantitative trait loci (QTL) associated with AX content. The highest number of MTAs was identified on chromosome 5A (seven), where three QTL regions were associated with AX content, while the lowest number of MTAs was detected on chromosomes 2B and 4B, where only one MTA identified a single locus. Conservation of synteny between SNP marker sequences and the annotated genes and proteins in Brachypodium distachyon, Oryza sativa and Sorghum bicolor allowed the identification of nine QTL coincident with candidate genes. These included a glycosyl hydrolase GH35, which encodes Gal7 and a glucosyltransferase GT31 on chromosome 1A; a cluster of GT1 genes on chromosome 2B that includes TaUGT1 and cisZog1; a glycosyl hydrolase that encodes a CelC gene on chromosome 3A; Ugt12887 and TaUGT1genes on chromosome 5A; a (1,3)-β-D-glucan synthase (Gsl12 gene) and a glucosyl hydrolase (Cel8 gene) on chromosome 7A. CONCLUSIONS: This study identifies significant MTAs for the AX content in the grain of tetraploid wheat genotypes. We propose that these may be used for molecular breeding of durum wheat varieties with higher soluble fibre content.Ilaria Marcotuli, Kelly Houston, Robbie Waugh, Geoffrey B. Fincher, Rachel A. Burton, Antonio Blanco, Agata Gadalet

Adelaide Research & Scholarship

Directory of Open Access Journals

Archivio istituzionale della ricerca - Università di Bari