Search CORE

202 research outputs found

A Robust Regression Model for a First-Order Autoregressive Time Series with Unequal Spacing: Technical Report

Author: Houseman E. Andres
Publication venue: Collection of Biostatistics Research Archive
Publication date: 15/10/2004
Field of study

Collection Of Biostatistics Research Archive

Cholesky Residuals for Assessing Normal Errors in a Linear Model with Correlated Outcomes: Technical Report

Author: Coull Brent
Houseman E. Andres
Ryan Louise
Publication venue: Collection of Biostatistics Research Archive
Publication date: 18/10/2004
Field of study

Despite the widespread popularity of linear models for correlated outcomes (e.g. linear mixed models and time series models), distribution diagnostic methodology remains relatively underdeveloped in this context. In this paper we present an easy-to-implement approach that lends itself to graphical displays of model fit. Our approach involves multiplying the estimated margional residual vector by the Cholesky decomposition of the inverse of the estimated margional variance matrix. The resulting rotated residuals are used to construct an empirical cumulative distribution function and pointwise standard errors. The theoretical framework, including conditions and asymptotic properties, involves technical details that are motivated by Lange and Ryan (1989), Pierce (1982), and Randles (1982). Our method appears to work well in a variety of circumstances, including models having independent units of sampling (clustered data) and models for which all observations are correlated (e.g., a single time series). Our methods can produce satisfactory results even for models that do not satisfy all of the technical conditions stated in our theory

Collection Of Biostatistics Research Archive

A Functional-Based Distribution Diagnostic for a Linear Model with Correlated Outcomes: Technical Report

Author: Coull Brent
Houseman E. Andres
Ryan Louise
Publication venue: Collection of Biostatistics Research Archive
Publication date: 18/10/2004
Field of study

Despite the widespread popularity of linear models for correlated outcomes (e.g. linear mixed modesl and time series models), distribution diagnostic methodology remains relatively underdeveloped in this context. In this paper we present an easy-to-implement approach that lends itself to graphical displays of model fit. Our approach involves multiplying the estimated marginal residual vector by the Cholesky decomposition of the inverse of the estimated marginal variance matrix. Linear functions or the resulting rotated residuals are used to construct an empirical cumulative distribution function (ECDF), whose stochastic limit is characterized. We describe a resampling technique that serves as a computationally efficient parametric bootstrap for generating representatives of the stochastic limit of the ECDF. Through functionals, such representatives are used to construct global tests for the hypothesis of normal margional errors. In addition, we demonstrate that the ECDF of the predicted random effects, as described by Lange and Ryan (1989), can be formulated as a special case of our approach. Thus, our method supports both omnibus and directed tests. Our method works well in a variety of circumstances, including models having independent units of sampling (clustered data) and models for which all observations are correlated (e.g., a single time series)

Collection Of Biostatistics Research Archive

A Nonstationary Negative Binomial Time Series with Time-Dependent Covariates: Enterococcus Counts in Boston Harbor

Author: Coull Brent
Houseman E. Andres
Shine James P.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 13/09/2005
Field of study

Boston Harbor has had a history of poor water quality, including contamination by enteric pathogens. We conduct a statistical analysis of data collected by the Massachusetts Water Resources Authority (MWRA) between 1996 and 2002 to evaluate the effects of court-mandated improvements in sewage treatment. Motivated by the ineffectiveness of standard Poisson mixture models and their zero-inflated counterparts, we propose a new negative binomial model for time series of Enterococcus counts in Boston Harbor, where nonstationarity and autocorrelation are modeled using a nonparametric smooth function of time in the predictor. Without further restrictions, this function is not identifiable in the presence of time-dependent covariates; consequently we use a basis orthogonal to the space spanned by the covariates and use penalized quasi-likelihood (PQL) for estimation. We conclude that Enterococcus counts were greatly reduced near the Nut Island Treatment Plant (NITP) outfalls following the transfer of wastewaters from NITP to the Deer Island Treatment Plant (DITP) and that the transfer of wastewaters from Boston Harbor to the offshore diffusers in Massachusetts Bay reduced the Enterococcus counts near the DITP outfalls

Collection Of Biostatistics Research Archive

A Computationally Tractable Multivariate Random Effects Model for Clustered Binary Data

Author: Betensky Rebecca A.
Coull Brent A
Houseman E. Andres
Publication venue: Collection of Biostatistics Research Archive
Publication date: 28/06/2006
Field of study

Collection Of Biostatistics Research Archive

Comparative proteomic and transcriptomic profiling of the fission yeast Schizosaccharomyces pombe

Author: Houseman Andres
Ivanov Alexander R
Schmidt Michael W
Wolf Dieter A
Publication venue: Nature Publishing Group
Publication date: 01/01/2007
Field of study

The fission yeast Schizosaccharomyces pombe is a widely used model organism to study basic mechanisms of eukaryotic biology, but unlike other model organisms, its proteome remains largely uncharacterized. Using a shotgun proteomics approach based on multidimensional prefractionation and tandem mass spectrometry, we have detected ∼30% of the theoretical fission yeast proteome. Applying statistical modelling to normalize spectral counts to the number of predicted tryptic peptides, we have performed label-free quantification of 1465 proteins. The fission yeast protein data showed considerable correlations with mRNA levels and with the abundance of orthologous proteins in budding yeast. Functional pathway analysis indicated that the mRNA–protein correlation is strong for proteins involved in signalling and metabolic processes, but increasingly discordant for components of protein complexes, which clustered in groups with similar mRNA–protein ratios. Self-organizing map clustering of large-scale protein and mRNA data from fission and budding yeast revealed coordinate but not always concordant expression of components of functional pathways and protein complexes. This finding reaffirms at the protein level the considerable divergence in gene expression patterns of the two model organisms that was noticed in previous transcriptomic studies

CiteSeerX

Harvard University - DASH

PubMed Central

Ranking Cancer Risks of Organic Hazardous Air Pollutants in the United States

Author: Bennett Deborah H.
Houseman E. Andres
Levy Jonathan I.
Loh Miranda M.
Spengler John D.
Publication venue: National Institute of Environmental Health Sciences
Publication date: 01/01/2007
Field of study

Julkari

PubMed Central

Recommended from our members

Reference-free cell mixture adjustments in analysis of DNA methylation data

Author: Houseman Eugene Andres
Marsit Carmen J.
Molitor John
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

MOTIVATION: Recently there has been increasing interest in the effects of cell mixture on the measurement of DNA methylation, specifically the extent to which small perturbations in cell mixture proportions can register as changes in DNA methylation. A recently published set of statistical methods exploits this association to infer changes in cell mixture proportions, and these methods are presently being applied to adjust for cell mixture effect in the context of epigenome-wide association studies. However, these adjustments require the existence of reference datasets, which may be laborious or expensive to collect. For some tissues such as placenta, saliva, adipose or tumor tissue, the relevant underlying cell types may not be known. RESULTS: We propose a method for conducting epigenome-wide association studies analysis when a reference dataset is unavailable, including a bootstrap method for estimating standard errors. We demonstrate via simulation study and several real data analyses that our proposed method can perform as well as or better than methods that make explicit use of reference datasets. In particular, it may adjust for detailed cell type differences that may be unavailable even in existing reference datasets. AVAILABILITY and IMPLEMENTATION: Software is available in the R package RefFreeEWAS. Data for three of four examples were obtained from Gene Expression Omnibus (GEO), accession numbers GSE37008, GSE42861 and GSE30601, while reference data were obtained from GEO accession number GSE39981.This is the publisher’s final pdf. The published article is copyrighted by the author(s) and published by Oxford University Press. The published article can be found at: http://bioinformatics.oxfordjournals.org/

ScholarsArchive@OSU