202 research outputs found
Cholesky Residuals for Assessing Normal Errors in a Linear Model with Correlated Outcomes: Technical Report
Despite the widespread popularity of linear models for correlated outcomes (e.g. linear mixed models and time series models), distribution diagnostic methodology remains relatively underdeveloped in this context. In this paper we present an easy-to-implement approach that lends itself to graphical displays of model fit. Our approach involves multiplying the estimated margional residual vector by the Cholesky decomposition of the inverse of the estimated margional variance matrix. The resulting rotated residuals are used to construct an empirical cumulative distribution function and pointwise standard errors. The theoretical framework, including conditions and asymptotic properties, involves technical details that are motivated by Lange and Ryan (1989), Pierce (1982), and Randles (1982). Our method appears to work well in a variety of circumstances, including models having independent units of sampling (clustered data) and models for which all observations are correlated (e.g., a single time series). Our methods can produce satisfactory results even for models that do not satisfy all of the technical conditions stated in our theory
A Functional-Based Distribution Diagnostic for a Linear Model with Correlated Outcomes: Technical Report
Despite the widespread popularity of linear models for correlated outcomes (e.g. linear mixed modesl and time series models), distribution diagnostic methodology remains relatively underdeveloped in this context. In this paper we present an easy-to-implement approach that lends itself to graphical displays of model fit. Our approach involves multiplying the estimated marginal residual vector by the Cholesky decomposition of the inverse of the estimated marginal variance matrix. Linear functions or the resulting rotated residuals are used to construct an empirical cumulative distribution function (ECDF), whose stochastic limit is characterized. We describe a resampling technique that serves as a computationally efficient parametric bootstrap for generating representatives of the stochastic limit of the ECDF. Through functionals, such representatives are used to construct global tests for the hypothesis of normal margional errors. In addition, we demonstrate that the ECDF of the predicted random effects, as described by Lange and Ryan (1989), can be formulated as a special case of our approach. Thus, our method supports both omnibus and directed tests. Our method works well in a variety of circumstances, including models having independent units of sampling (clustered data) and models for which all observations are correlated (e.g., a single time series)
A Nonstationary Negative Binomial Time Series with Time-Dependent Covariates: Enterococcus Counts in Boston Harbor
Boston Harbor has had a history of poor water quality, including contamination by enteric pathogens. We conduct a statistical analysis of data collected by the Massachusetts Water Resources Authority (MWRA) between 1996 and 2002 to evaluate the effects of court-mandated improvements in sewage treatment. Motivated by the ineffectiveness of standard Poisson mixture models and their zero-inflated counterparts, we propose a new negative binomial model for time series of Enterococcus counts in Boston Harbor, where nonstationarity and autocorrelation are modeled using a nonparametric smooth function of time in the predictor. Without further restrictions, this function is not identifiable in the presence of time-dependent covariates; consequently we use a basis orthogonal to the space spanned by the covariates and use penalized quasi-likelihood (PQL) for estimation. We conclude that Enterococcus counts were greatly reduced near the Nut Island Treatment Plant (NITP) outfalls following the transfer of wastewaters from NITP to the Deer Island Treatment Plant (DITP) and that the transfer of wastewaters from Boston Harbor to the offshore diffusers in Massachusetts Bay reduced the Enterococcus counts near the DITP outfalls
Comparative proteomic and transcriptomic profiling of the fission yeast Schizosaccharomyces pombe
The fission yeast Schizosaccharomyces pombe is a widely used model organism to study basic mechanisms of eukaryotic biology, but unlike other model organisms, its proteome remains largely uncharacterized. Using a shotgun proteomics approach based on multidimensional prefractionation and tandem mass spectrometry, we have detected ā¼30% of the theoretical fission yeast proteome. Applying statistical modelling to normalize spectral counts to the number of predicted tryptic peptides, we have performed label-free quantification of 1465 proteins. The fission yeast protein data showed considerable correlations with mRNA levels and with the abundance of orthologous proteins in budding yeast. Functional pathway analysis indicated that the mRNAāprotein correlation is strong for proteins involved in signalling and metabolic processes, but increasingly discordant for components of protein complexes, which clustered in groups with similar mRNAāprotein ratios. Self-organizing map clustering of large-scale protein and mRNA data from fission and budding yeast revealed coordinate but not always concordant expression of components of functional pathways and protein complexes. This finding reaffirms at the protein level the considerable divergence in gene expression patterns of the two model organisms that was noticed in previous transcriptomic studies
Recommended from our members
Reference-free cell mixture adjustments in analysis of DNA methylation data
MOTIVATION: Recently there has been increasing interest in the effects
of cell mixture on the measurement of DNA methylation, specifically
the extent to which small perturbations in cell mixture proportions can
register as changes in DNA methylation. A recently published set of
statistical methods exploits this association to infer changes in cell
mixture proportions, and these methods are presently being applied
to adjust for cell mixture effect in the context of epigenome-wide association
studies. However, these adjustments require the existence
of reference datasets, which may be laborious or expensive to collect.
For some tissues such as placenta, saliva, adipose or tumor tissue, the
relevant underlying cell types may not be known.
RESULTS: We propose a method for conducting epigenome-wide association
studies analysis when a reference dataset is unavailable,
including a bootstrap method for estimating standard errors. We demonstrate
via simulation study and several real data analyses that our
proposed method can perform as well as or better than methods that
make explicit use of reference datasets. In particular, it may adjust for
detailed cell type differences that may be unavailable even in existing
reference datasets.
AVAILABILITY and IMPLEMENTATION: Software is available in the R
package RefFreeEWAS. Data for three of four examples were
obtained from Gene Expression Omnibus (GEO), accession numbers
GSE37008, GSE42861 and GSE30601, while reference data were
obtained from GEO accession number GSE39981.This is the publisherās final pdf. The published article is copyrighted by the author(s) and published by Oxford University Press. The published article can be found at: http://bioinformatics.oxfordjournals.org/
- ā¦