33 research outputs found
A Generalized Estimating Equation Approach to Multivariate Adaptive Regression Splines
<p>Multivariate adaptive regression splines (MARS) is a popular nonparametric regression tool often used for prediction and for uncovering important data patterns between the response and predictor variables. The standard MARS algorithm assumes responses are normally distributed and independent, but in this article we relax both of these assumptions by extending MARS to generalized estimating equations. We refer to this MARS-for-GEEs algorithm as “MARGE.” Our algorithm makes use of fast forward selection techniques, such that in the univariate case, MARGE has similar computation speed to a standard MARS implementation. Through simulation we show that the proposed algorithm has improved predictive performance than the original MARS algorithm when using correlated and/or nonnormal response data. MARGE is also competitive with alternatives in the literature, especially for problems with multiple interacting predictors. We apply MARGE to various ecological examples with different data types. Supplementary material for this article is available online.</p
Appendix B. Tables showing P values and power-simulation results for each of the 19 data sets.
Tables showing P values and power-simulation results for each of the 19 data sets
Appendix A. Tables and graphs for the analysis of the “something fishy” example: proportion of fish on empty stomachs by geographic location and trophic group.
Tables and graphs for the analysis of the “something fishy” example: proportion of fish on empty stomachs by geographic location and trophic group
Supplement 1. R code demonstrating how to fit a logistic regression model, with a random intercept term, and how to use resampling-based hypothesis testing for inference.
<h2>File List</h2><blockquote>
<p><a href="glmmeg.R">glmmeg.R</a>: R code demonstrating how to fit a logistic regression model, with a random intercept term, to randomly generated overdispersed binomial data.</p>
<p><a href="boot.glmm.R">boot.glmm.R</a>: R code for estimating <i>P</i>-values by applying the bootstrap to a GLMM likelihood ratio statistic.</p>
</blockquote><h2>Description</h2><blockquote>
<p>glmm.R is some example R code which show how to fit a logistic regression model (with or without a random effects term) and use diagnostic plots to check the fit. The code is run on some randomly generated data, which are generated in such a way that overdispersion is evident. This code could be directly applied for your own analyses if you read into R a data.frame called “dataset”, which has columns labelled “success” and “failure” (for number of binomial successes and failures), “species” (a label for the different rows in the dataset), and where we want to test for the effect of some predictor variable called “location”. In other cases, just change the labels and formula as appropriate.</p>
<p>boot.glmm.R extends glmm.R by using bootstrapping to calculate P-values in a way that provides better control of Type I error in small samples. It accepts data in the same form as that generated in glmm.R.</p>
</blockquote
Appendix D. Results of power simulations for binomial data, and binomial data with overdispersion.
Results of power simulations for binomial data, and binomial data with overdispersion
Appendix B. Tables and graphs related to the analysis of the “expanding leaves” example: percentage loss of leaf area (LLA) by median expansion time and site.
Tables and graphs related to the analysis of the “expanding leaves” example: percentage loss of leaf area (LLA) by median expansion time and site
Appendix C. Results of Type I error simulations for binomial data, and binomial data with overdispersion.
Results of Type I error simulations for binomial data, and binomial data with overdispersion
Appendix F. Results of simulations using resampling based hypothesis testing to control Type I error in small samples.
Results of simulations using resampling based hypothesis testing to control Type I error in small samples
Residual plots for a negative binomial regression model fitted to the copepod data of Table 1, using (a) Pearson residuals; (b) PIT-residuals.
<p>Different colours used for different species. Notice that the predominant patterns in (a) are the line of points towards the left (corresponding to zeros) and asymmetry about the horizontal line <i>y</i> = 0 (marked in red). These trends, due to the discreteness of the data rather than lack of fit, have been removed in (b) such that the reader can focus on the question of goodness-of-fit.</p
Appendix H. Data on seed mass and survivorship through predispersal seed predation from the global literature.
Data on seed mass and survivorship through predispersal seed predation from the global literature