31 research outputs found
Prior Elicitation for Generalised Linear Models and Extensions
A statistical method for the elicitation of priors in Bayesian generalised
linear models (GLMs) and extensions is proposed. Probabilistic predictions are
elicited from the expert to parametrise a multivariate t prior distribution for
the unknown linear coefficients of the GLM and an inverse gamma prior for the
dispersion parameter, if unknown. The elicited predictions condition on defined
elicitation scenarios. Dependencies among scenarios are then elicited from the
expert by additionally conditioning on hypothetical experiments. Elicited
conditional medians efficiently parametrise a canonical vine copula model of
dependence that may be truncated for efficiency. The statistical elicitation
method permits prior parametrisation of GLMs with alternative choices of design
matrices or observation models from the same elicitation session. Extensions of
the method apply to multivariate data, data with bounded support,
semi-continuous data with point mass at zero, and count data with
overdispersion or zero-inflation. A case study elicits a prior for an extended
GLM embedded in a statistical model of overdispersed counts described by a
binomial-simplex mixture distribution. The elicited canonical vine model of
dependence is found to incorporate substantial information into the prior. The
procedures of the statistical elicitation method are implemented in the R
package eglm
Recommended from our members
Predicting the stability, equilibrium response, and nonequilibrium dynamics of ecological systems
In this dissertation, new theory and its applications are developed to predict three properties of complex ecological communities: stability, equilibrium response, and non-equilibrium dynamics. First, a graph-theoretic analysis identifies the interconnections in a complex ecosystem that promote or diminish stability (Chapter 2). The hierarchy of interactions that influences stability and feedback processes can guide resource allocation for environmental monitoring, investigate alternative management strategies, and help formulate novel research hypotheses. Second, a combined graph-theoretic and probabilistic approach evaluates the potential for long-term changes in equilibrium (Chapter 3). Conditional probabilities of long-term increase and decrease in variables are transferred from the graph-theoretic models into a Bayesian network. The Bayesian network allows researchers both to predict how an ecosystem might change given a perturbation and to diagnose which model structure best matches empirical observations. Third, a threshold index predicts whether or not largemagnitude short-term transitory changes in disease prevalence can occur (Chapter 4). The concept of reactivity is used to derive a threshold index for epidemicity, E0, which gives the maximum number of new infections produced by an infective individual at a disease free equilibrium. This index provides a threshold that determines whether or not major epidemics are possible. The relative importance of parameters differs between control strategies that seek to reduce endemicity and those that seek to reduce epidemicity. The index E0 therefore is an important measure of epidemic potential that may assist efforts to control epidemics. Together these approaches provide new theory that help bridge the gap between our need to understand complex ecological systems and the empirical data available for their characterization
Recommended from our members
Association of Juvenile Salmon and Estuarine Fish with Intertidal Seagrass and Oyster Aquaculture Habitats in a Northeast Pacific Estuary
Structured estuarine habitats, such as salt marshes, seagrass beds, and oyster reefs, are recognized as critical nurseries for juvenile fish and crustaceans. Estuarine habitat usage by fish, including juvenile Pacific salmon Oncorhynchus spp., was characterized by sampling with a modified tow net in Willapa Bay, Washington, where 20% of the intertidal area is utilized for shellfish aquaculture and thus is difficult to sample with conventional gear. Our goal was to compare fish use of relatively undisturbed habitats (open mudflat, seagrass, and channel habitats) with the use of nearby oyster culture habitat. Although many species showed significant temporal and spatial trends within the estuary, only Shiner Perch Cymatogaster aggregata exhibited a significant association with habitat. Juveniles of three salmonid species exhibited few associations with the low intertidal habitats over which they were captured or in the prey types they consumed there. Chinook Salmon O. tshawytscha, likely hatchery-released ocean-type fish, were the most common salmonid captured, and they utilized low intertidal areas throughout the summer as their mean size increased from 85 to 100 mm FL. Diets consumed by these larger juvenile Chinook Salmon were not associated with benthic habitat but instead consisted primarily of (1) insects from nearby marsh or terrestrial habitats and (2) planktonic prey, like decapod larvae and tunicate larvaceans. Juvenile Coho Salmon O. kisutch and Chum Salmon O. keta were captured earlier (April and May) and fed on a slightly different suite of prey taxa, which were also primarily pelagic rather than associated with the intertidal benthos. Our findings suggest that in this relatively shallow coastal estuary, the role of benthic habitat is not closely linked to its value as a source of food for large juvenile salmon out-migrants utilizing the low intertidal areas where aquaculture occurs
Comparison of Pathway Analysis Approaches Using Lung Cancer GWAS Data Sets
Pathway analysis has been proposed as a complement to single SNP analyses in GWAS. This study compared pathway analysis methods using two lung cancer GWAS data sets based on four studies: one a combined data set from Central Europe and Toronto (CETO); the other a combined data set from Germany and MD Anderson (GRMD). We searched the literature for pathway analysis methods that were widely used, representative of other methods, and had available software for performing analysis. We selected the programs EASE, which uses a modified Fishers Exact calculation to test for pathway associations, GenGen (a version of Gene Set Enrichment Analysis (GSEA)), which uses a Kolmogorov-Smirnov-like running sum statistic as the test statistic, and SLAT, which uses a p-value combination approach. We also included a modified version of the SUMSTAT method (mSUMSTAT), which tests for association by averaging χ2 statistics from genotype association tests. There were nearly 18000 genes available for analysis, following mapping of more than 300,000 SNPs from each data set. These were mapped to 421 GO level 4 gene sets for pathway analysis. Among the methods designed to be robust to biases related to gene size and pathway SNP correlation (GenGen, mSUMSTAT and SLAT), the mSUMSTAT approach identified the most significant pathways (8 in CETO and 1 in GRMD). This included a highly plausible association for the acetylcholine receptor activity pathway in both CETO (FDR≤0.001) and GRMD (FDR = 0.009), although two strong association signals at a single gene cluster (CHRNA3-CHRNA5-CHRNB4) drive this result, complicating its interpretation. Few other replicated associations were found using any of these methods. Difficulty in replicating associations hindered our comparison, but results suggest mSUMSTAT has advantages over the other approaches, and may be a useful pathway analysis tool to use alongside other methods such as the commonly used GSEA (GenGen) approach
Spatiotemporal clustering using Gaussian processes embedded in a mixture model
The categorization of multidimensional data into clusters is a common task in statistics. Many applications of clustering, including the majority of tasks in ecology, use data that is inherently spatial and is often also temporal. However, spatiotemporal dependence is typically ignored when clustering multivariate data. We present a finite mixture model for spatial and spatiotemporal clustering that incorporates spatial and spatiotemporal autocorrelation by including appropriate Gaussian processes (GP) into a model for the mixing proportions. We also allow for flexible and semiparametric dependence on environmental covariates, once again using GPs. We propose to use Bayesian inference through three tiers of approximate methods: a Laplace approximation that allows efficient analysis of large datasets, and both partial and full Markov chain Monte Carlo (MCMC) approaches that improve accuracy at the cost of increased computational time. Comparison of the methods shows that the Laplace approximation is a useful alternative to the MCMC methods. A decadal analysis of 253 species of teleost fish from 854 samples collected along the biodiverse northwestern continental shelf of Australia between 1986 and 1997 shows the added clarity provided by accounting for spatial autocorrelation. For these data, the temporal dependence is comparatively small, which is an important finding given the changing human pressures over this time.Peer reviewe
Functional group based marine ecosystem assessment for the Bay of Biscay via elasticity analysis
The transitory and long-term elasticities of the Bay of Biscay ecosystem to densityindependent and density-dependent influences were estimated within a state space model that accounted for both process and observation uncertainties. A functional group based model for the Bay of Biscay fish ecosystem was fit to time series obtained from scientific survey and commercial catch and effort data. The observation model parameters correspond to the unknown catchabilities and observation error variances that vary across the commercial fisheries and fishery-independent scientific surveys. The process model used a Gompertz form of density dependence, which is commonly used for the analysis of multivariate ecological time series, with unknown time-varying fishing mortalities. Elasticity analysis showed that the process model parameters are directly interpretable in terms of one-year look-ahead prediction elasticities, which measure the proportional response of a functional group in the next year given a proportional change to a variable or parameter in the current year. The density dependent parameters were also shown to define the elasticities of the long term means or quantiles of the functional groups to changes in fishing pressure. Evidence for the importance of indirect effects, mediated by density dependence, in determining the ecosystem response of the Bay of Biscay to changes in fishing pressure is presented. The state space model performed favourably in an assessment of model adequacy that compared observations of catch per unit effort against cross-validation predictive densities blocked by year
Gaussian processs framework for temporal dependence and discrepancy functions in Ricker-type population growth models
Density dependent population growth functions are of central importance to population dynamics modelling because they describe the theoretical rate of recruitment of new individuals to a natural population. Traditionally these functions are described with a fixed functional form with temporally constant parameters and without species interactions. The Ricker stock-recruitment model is one such function that is commonly used in fisheries stock assessment. In recent years, there has been increasing interest in semi-parametric and temporally varying population growth models. The former are related to the general statistical approach of using semi-parametric discrepancy functions, such as Gaussian processes (GP), to model deviations of data around the expected parametric function. In the latter, the reproductive rate, which is a key parameter describing the population growth rate, is assumed to vary in time. In this work, we introduce how these existing Ricker population growth models can be formulated under the same statistical approach of hierarchical GP models. We also show how the time invariant semi-parametric approach can be extended and combined with the time varying reproductive rate using a GP model. Then we extend these models to the multispecies setting by incorporating cross-covariances among species with a continuous time covariance structure using the linear model of coregionalization. As a case study, we examine the productivity of three Pacific salmon populations. We compare the alternative Ricker population growth functions using model posterior probabilities and leave-one-out cross validation predictive densities. Our results show substantial temporal variation in maximum reproductive rates and reveal temporal dependence among the species, which have direct management implications. However, our results do not support inclusion of semi-parametric discrepancy function and they suggest that the semi-parametric discrepancy functions may lead to challenges in parameter identifiability more generally.Peer reviewe
Making inference from wildlife collision data: inferring predator absence from prey strikes
Wildlife collision data are ubiquitous, though challenging for making ecological inference due to typically irreducible uncertainty relating to the sampling process. We illustrate a new approach that is useful for generating inference from predator data arising from wildlife collisions. By simply conditioning on a second prey species sampled via the same collision process, and by using a biologically realistic numerical response functions, we can produce a coherent numerical response relationship between predator and prey. This relationship can then be used to make inference on the population size of the predator species, including the probability of extinction. The statistical conditioning enables us to account for unmeasured variation in factors influencing the runway strike incidence for individual airports and to enable valid comparisons. A practical application of the approach for testing hypotheses about the distribution and abundance of a predator species is illustrated using the hypothesized red fox incursion into Tasmania, Australia. We estimate that conditional on the numerical response between fox and lagomorph runway strikes on mainland Australia, the predictive probability of observing no runway strikes of foxes in Tasmania after observing 15 lagomorph strikes is 0.001. We conclude there is enough evidence to safely reject the null hypothesis that there is a widespread red fox population in Tasmania at a population density consistent with prey availability. The method is novel and has potential wider application
The relative importance of environmental stochasticity, interspecific interactions, and observation error: Insights from sardine and anchovy landings
Long-term time series of sardine and anchovy landings often suggest negative dependence between these species, and an array of mechanisms have been proposed as explanations. We reduce these propositions to four basic hypotheses of (1) independence, (2) correlated process noise, (3) interspecific interactions, and (4) correlated observational error. We use a Bayesian approach to develop priors for parsimonious state space models with both process noise and observation error that represent each of these hypotheses, and apply this approach to five long-term time series of landings collected from the Pacific and Atlantic Oceans. Model comparison criteria suggest that the hypothesis of correlated process noise has the broadest support, where the temporal dependence of anchovy and sardines may be caused in part by either direct environmental influence on their physiology, or indirect bottom-up effects on their prey. However, all hypotheses find some degree of support within the five time series, and in general, the sardine and anchovy landings suggest weak intraspecific density dependence and susceptibility to both environmental and anthropogenic perturbation. Results additionally suggest that the best fitting hypothesis depends on the choice of geographic scale, temporal scale, and stock definition of the recorded landings