622 research outputs found
FIVE THINGS I WISH MY MOTHER HAD TOLD ME, ABOUT STATISTICS THAT IS
I present five short stories, each describing something I wish I had known and appreciated earlier in my statistical life. The five are Simpson\u27s paradox is everywhere, numerical optimization algorithms can be deceived, you can\u27t always trust the Satterthwaite approximation, BLUP\u27s are wonderful things, and It\u27s good to know Reverend Bayes
Pooling of Variances: The Skeleton in the Mixed Model Closet?
I explore three related issues concerning pooling of error variances: when is it appropriate (or not) to pool, how best to evaluate equality of variances, and whether there is a cost to never pooling. I focus on pooling decisions in a combined analysis of a multi-site experiment. A-priori, sites should have different error variances. My primary question is whether an analysis that ignores unequal variances is wrong.
I find that ignoring heteroscedasticity between sites maintains, or provides slightly conservative, tests of average treatment effects and treatment-by-site interactions. Models with site-specific variances do provide more powerful tests when variances are different. Never pooling, i.e., using site-specific variances when variances are equal, also reduces power. In contrast to the relatively benign effects of pooling across sites, incorrectly pooling across treatments is much more serious.
AIC-based evaluations of variances are very sensitive to non-normality, with a strong tendency to indicate unequal variances when that is incorrect and the data are non-normal. While Levene’s test is somewhat liberal when errors are skewed or heavy-tailed, it is much more robust than AIC.
I conclude that ignoring site-specific error variances is not wrong, but modeling that heterogeneity will increase power. If there is any possibility that errors are non-normal, I suggest that variance models be evaluated using Levene’s test instead of AIC
Model Averaging in Agriculture and Natural Resources: What Is It? When Is It Useful? When Is It a Distraction?
I use two examples to illustrate three methods for model averaging: using AIC weights, using BIC weights, and fully Bayesian analyses. The first example is a capture-recapture study that estimates the population size by averaging over 4 models for capture probabilities. The second is an analysis of a study of logging impacts on Curculionid weevils using a before-after-control-impact (BACI) study design. The estimated impact is averaged over 4 ecologically relevant models.
Both examples demonstrate the sensitivity of model weights, or posterior model probabilities, to the choice of prior model probabilities and prior distributions for parameters. The model averaged estimates and their confidence intervals are less influenced by those choices. The BACI-design example also demonstrates the need to carefully choose the model parameterization so that the parameter of interest, the interaction, has the same interpretation for all models in the model set. I also briefly discuss three other frequentist approaches to model averaging: bagging, stacking, and model-averaged-tail-area confidence intervals
Package ecespa
Documentation for the R-package "ecespa
COMBINING ECONOMIC AND BIOLOGICAL DATA TO ESTIMATE THE IMPACT OF POLLUTION ON CROP PRODUCTION
Duality methods utilizing a profit function framework are employed to estimate the output elasticity of ambient ozone levels on cash grain farms in Illinois. While duality methods have been recommended as a cure to many of the statistical problems of direct estimation of production functions, multicollinearity may still be a problem. A method for utilizing stochastic information on parameters of a seemingly unrelated system of equations, which is implied by profit function estimation, is developed and applied to measuring the impact of ozone. Such an approach may be necessary in measuring other environmental effects because of a lack of regressor variability.Crop Production/Industries, Environmental Economics and Policy,
COMET: A Recipe for Learning and Using Large Ensembles on Massive Data
COMET is a single-pass MapReduce algorithm for learning on large-scale data.
It builds multiple random forest ensembles on distributed blocks of data and
merges them into a mega-ensemble. This approach is appropriate when learning
from massive-scale data that is too large to fit on a single machine. To get
the best accuracy, IVoting should be used instead of bagging to generate the
training subset for each decision tree in the random forest. Experiments with
two large datasets (5GB and 50GB compressed) show that COMET compares favorably
(in both accuracy and training time) to learning on a subsample of data using a
serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble
evaluation which dynamically decides how many ensemble members to evaluate per
data point; this can reduce evaluation cost by 100X or more
The Effects of Drought on Foraging Habitat Selection of Breeding Wood Storks in Coastal Georgia
Foraging habitat use by Wood Storks (Mycteria americana) during the breeding season was studied for three coastal colonies during a drought year and compared to habitat use during normal rainfall years. Information on the distribution of wetland habitat types was derived using U.S. Fish and Wildlife Service National Wetland In- ventory (NWI) data within a Geographic Information System (GIS). Foraging locations were obtained by following storks from their colonies in a fixed-winged aircraft. Differences in hydrologic condition and, the resulting prey availability in coastal zone freshwater wetlands greatly affected foraging habitat use and breeding success of the three stork colonies. In 1997 (dry), although the foraging range of each colony did not differ from wetter years, storks used estuarine foraging habitats much more extensively. Breeding success (fledged young/nest) in 1997 was less than half the success of the wetter years. Palustrine (freshwater) wetlands seem very important to storks breed- ing along the Georgia coast. During dry years, estuarine wetlands, by themselves, do not appear to be able to support the breeding population of storks in this region. Reasons why these productive wetlands do not provide sufficient resources for successful breeding are unclear, but could include limitations to only two foraging periods (low tides) in a 24-hr period
The Effects of Drought on Foraging Habitat Selection of Breeding Wood Storks in Coastal Georgia
Foraging habitat use by Wood Storks (Mycteria americana) during the breeding season was studied for three coastal colonies during a drought year and compared to habitat use during normal rainfall years. Information on the distribution of wetland habitat types was derived using U.S. Fish and Wildlife Service National Wetland In- ventory (NWI) data within a Geographic Information System (GIS). Foraging locations were obtained by following storks from their colonies in a fixed-winged aircraft. Differences in hydrologic condition and, the resulting prey availability in coastal zone freshwater wetlands greatly affected foraging habitat use and breeding success of the three stork colonies. In 1997 (dry), although the foraging range of each colony did not differ from wetter years, storks used estuarine foraging habitats much more extensively. Breeding success (fledged young/nest) in 1997 was less than half the success of the wetter years. Palustrine (freshwater) wetlands seem very important to storks breed- ing along the Georgia coast. During dry years, estuarine wetlands, by themselves, do not appear to be able to support the breeding population of storks in this region. Reasons why these productive wetlands do not provide sufficient resources for successful breeding are unclear, but could include limitations to only two foraging periods (low tides) in a 24-hr period
- …