3,593 research outputs found
Just Another Gibbs Additive Modeller: Interfacing JAGS and mgcv
The BUGS language offers a very flexible way of specifying complex
statistical models for the purposes of Gibbs sampling, while its JAGS variant
offers very convenient R integration via the rjags package. However, including
smoothers in JAGS models can involve some quite tedious coding, especially for
multivariate or adaptive smoothers. Further, if an additive smooth structure is
required then some care is needed, in order to centre smooths appropriately,
and to find appropriate starting values. R package mgcv implements a wide range
of smoothers, all in a manner appropriate for inclusion in JAGS code, and
automates centring and other smooth setup tasks. The purpose of this note is to
describe an interface between mgcv and JAGS, based around an R function,
`jagam', which takes a generalized additive model (GAM) as specified in mgcv
and automatically generates the JAGS model code and data required for inference
about the model via Gibbs sampling. Although the auto-generated JAGS code can
be run as is, the expectation is that the user would wish to modify it in order
to add complex stochastic model components readily specified in JAGS. A simple
interface is also provided for visualisation and further inference about the
estimated smooth components using standard mgcv functionality. The methods
described here will be un-necessarily inefficient if all that is required is
fully Bayesian inference about a standard GAM, rather than the full flexibility
of JAGS. In that case the BayesX package would be more efficient.Comment: Submitted to the Journal of Statistical Softwar
Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models
An Extended Empirical Saddlepoint Approximation for Intractable Likelihoods
The challenges posed by complex stochastic models used in computational
ecology, biology and genetics have stimulated the development of approximate
approaches to statistical inference. Here we focus on Synthetic Likelihood
(SL), a procedure that reduces the observed and simulated data to a set of
summary statistics, and quantifies the discrepancy between them through a
synthetic likelihood function. SL requires little tuning, but it relies on the
approximate normality of the summary statistics. We relax this assumption by
proposing a novel, more flexible, density estimator: the Extended Empirical
Saddlepoint approximation. In addition to proving the consistency of SL, under
either the new or the Gaussian density estimator, we illustrate the method
using two examples. One of these is a complex individual-based forest model for
which SL offers one of the few practical possibilities for statistical
inference. The examples show that the new density estimator is able to capture
large departures from normality, while being scalable to high dimensions, and
this in turn leads to more accurate parameter estimates, relative to the
Gaussian alternative. The new density estimator is implemented by the esaddle R
package, which can be found on the Comprehensive R Archive Network (CRAN)
Shape constrained additive models
A framework is presented for generalized additive modelling under shape constraints on the component functions of the linear predictor of the GAM. We represent shape constrained model components by mildly non-linear extensions of P-splines. Models can contain multiple shape constrained and unconstrained terms as well as shape constrained multi-dimensional smooths. The constraints considered are on the sign of the first or/and the second derivatives of the smooth terms. A key advantage of the approach is that it facilitates efficient estimation of smoothing parameters as an integral part of model estimation, via GCV or AIC, and numerically robust algorithms for this are presented. We also derive simulation free approximate Bayesian confidence intervals for the smooth components, which are shown to achieve close to nominal coverage probabilities. Applications are presented using real data examples including the risk of disease in relation to proximity to municipal incinerators and the association between air pollution and health
Scalable visualisation methods for modern Generalized Additive Models
In the last two decades the growth of computational resources has made it
possible to handle Generalized Additive Models (GAMs) that formerly were too
costly for serious applications. However, the growth in model complexity has
not been matched by improved visualisations for model development and results
presentation. Motivated by an industrial application in electricity load
forecasting, we identify the areas where the lack of modern visualisation tools
for GAMs is particularly severe, and we address the shortcomings of existing
methods by proposing a set of visual tools that a) are fast enough for
interactive use, b) exploit the additive structure of GAMs, c) scale to large
data sets and d) can be used in conjunction with a wide range of response
distributions. All the new visual methods proposed in this work are implemented
by the mgcViz R package, which can be found on the Comprehensive R Archive
Network
COVID-19 and the difficulty of inferring epidemiological parameters from clinical data
Knowing the infection fatality ratio (IFR) is of crucial importance for
evidence-based epidemic management: for immediate planning; for balancing the
life years saved against the life years lost due to the consequences of
management; and for evaluating the ethical issues associated with the tacit
willingness to pay substantially more for life years lost to the epidemic, than
for those to other diseases. Against this background Verity et al. (2020,
Lancet Infections Diseases) have rapidly assembled case data and used
statistical modelling to infer the IFR for COVID-19. We have attempted an
in-depth statistical review of their approach, to identify to what extent the
data are sufficiently informative about the IFR to play a greater role than the
modelling assumptions, and have tried to identify those assumptions that appear
to play a key role. Given the difficulties with other data sources, we provide
a crude alternative analysis based on the Diamond Princess Cruise ship data and
case data from China, and argue that, given the data problems, modelling of
clinical data to obtain the IFR can only be a stop-gap measure. What is needed
is near direct measurement of epidemic size by PCR and/or antibody testing of
random samples of the at risk population.Comment: Version accepted by the Lancet Infectious Diseases. See previous
version for less terse presentatio
Was R < 1 before the English lockdowns?:On modelling mechanistic detail, causality and inference about Covid-19
Detail is a double edged sword in epidemiological modelling. The inclusion of mechanistic detail in models of highly complex systems has the potential to increase realism, but it also increases the number of modelling assumptions, which become harder to check as their possible interactions multiply. In a major study of the Covid-19 epidemic in England, Knock et al. (2020) fit an age structured SEIR model with added health service compartments to data on deaths, hospitalization and test results from Covid-19 in seven English regions for the period March to December 2020. The simplest version of the model has 684 states per region. One main conclusion is that only full lockdowns brought the pathogen reproduction number, R, below one, with R ≫ 1 in all regions on the eve of March 2020 lockdown. We critically evaluate the Knock et al. epidemiological model, and the semi-causal conclusions made using it, based on an independent reimplementation of the model designed to allow relaxation of some of its strong assumptions. In particular, Knock et al. model the effect on transmission of both non-pharmaceutical interventions and other effects, such as weather, using a piecewise linear function, b(t), with 12 breakpoints at selected government announcement or intervention dates. We replace this representation by a smoothing spline with time varying smoothness, thereby allowing the form of b(t) to be substantially more data driven, and we check that the corresponding smoothness assumption is not driving our results. We also reset the mean incubation time and time from first symptoms to hospitalisation, used in the model, to values implied by the papers cited by Knock et al. as the source of these quantities. We conclude that there is no sound basis for using the Knock et al. model and their analysis to make counterfactual statements about the number of deaths that would have occurred with different lockdown timings. However, if fits of this epidemiological model structure are viewed as a reasonable basis for inference about the time course of incidence and R, then without very strong modelling assumptions, the pathogen reproduction number was probably below one, and incidence in substantial decline, some days before either of the first two English national lockdowns. This result coincides with that obtained by more direct attempts to reconstruct incidence. Of course it does not imply that lockdowns had no effect, but it does suggest that other non-pharmaceutical interventions (NPIs) may have been much more effective than Knock et al. imply, and that full lockdowns were probably not the cause of R dropping below one
- …