30 research outputs found

    Prior Elicitation for Generalised Linear Models and Extensions

    Full text link
    A statistical method for the elicitation of priors in Bayesian generalised linear models (GLMs) and extensions is proposed. Probabilistic predictions are elicited from the expert to parametrise a multivariate t prior distribution for the unknown linear coefficients of the GLM and an inverse gamma prior for the dispersion parameter, if unknown. The elicited predictions condition on defined elicitation scenarios. Dependencies among scenarios are then elicited from the expert by additionally conditioning on hypothetical experiments. Elicited conditional medians efficiently parametrise a canonical vine copula model of dependence that may be truncated for efficiency. The statistical elicitation method permits prior parametrisation of GLMs with alternative choices of design matrices or observation models from the same elicitation session. Extensions of the method apply to multivariate data, data with bounded support, semi-continuous data with point mass at zero, and count data with overdispersion or zero-inflation. A case study elicits a prior for an extended GLM embedded in a statistical model of overdispersed counts described by a binomial-simplex mixture distribution. The elicited canonical vine model of dependence is found to incorporate substantial information into the prior. The procedures of the statistical elicitation method are implemented in the R package eglm

    Comparison of Pathway Analysis Approaches Using Lung Cancer GWAS Data Sets

    Get PDF
    Pathway analysis has been proposed as a complement to single SNP analyses in GWAS. This study compared pathway analysis methods using two lung cancer GWAS data sets based on four studies: one a combined data set from Central Europe and Toronto (CETO); the other a combined data set from Germany and MD Anderson (GRMD). We searched the literature for pathway analysis methods that were widely used, representative of other methods, and had available software for performing analysis. We selected the programs EASE, which uses a modified Fishers Exact calculation to test for pathway associations, GenGen (a version of Gene Set Enrichment Analysis (GSEA)), which uses a Kolmogorov-Smirnov-like running sum statistic as the test statistic, and SLAT, which uses a p-value combination approach. We also included a modified version of the SUMSTAT method (mSUMSTAT), which tests for association by averaging χ2 statistics from genotype association tests. There were nearly 18000 genes available for analysis, following mapping of more than 300,000 SNPs from each data set. These were mapped to 421 GO level 4 gene sets for pathway analysis. Among the methods designed to be robust to biases related to gene size and pathway SNP correlation (GenGen, mSUMSTAT and SLAT), the mSUMSTAT approach identified the most significant pathways (8 in CETO and 1 in GRMD). This included a highly plausible association for the acetylcholine receptor activity pathway in both CETO (FDR≤0.001) and GRMD (FDR = 0.009), although two strong association signals at a single gene cluster (CHRNA3-CHRNA5-CHRNB4) drive this result, complicating its interpretation. Few other replicated associations were found using any of these methods. Difficulty in replicating associations hindered our comparison, but results suggest mSUMSTAT has advantages over the other approaches, and may be a useful pathway analysis tool to use alongside other methods such as the commonly used GSEA (GenGen) approach

    Spatiotemporal clustering using Gaussian processes embedded in a mixture model

    Get PDF
    The categorization of multidimensional data into clusters is a common task in statistics. Many applications of clustering, including the majority of tasks in ecology, use data that is inherently spatial and is often also temporal. However, spatiotemporal dependence is typically ignored when clustering multivariate data. We present a finite mixture model for spatial and spatiotemporal clustering that incorporates spatial and spatiotemporal autocorrelation by including appropriate Gaussian processes (GP) into a model for the mixing proportions. We also allow for flexible and semiparametric dependence on environmental covariates, once again using GPs. We propose to use Bayesian inference through three tiers of approximate methods: a Laplace approximation that allows efficient analysis of large datasets, and both partial and full Markov chain Monte Carlo (MCMC) approaches that improve accuracy at the cost of increased computational time. Comparison of the methods shows that the Laplace approximation is a useful alternative to the MCMC methods. A decadal analysis of 253 species of teleost fish from 854 samples collected along the biodiverse northwestern continental shelf of Australia between 1986 and 1997 shows the added clarity provided by accounting for spatial autocorrelation. For these data, the temporal dependence is comparatively small, which is an important finding given the changing human pressures over this time.Peer reviewe

    Functional group based marine ecosystem assessment for the Bay of Biscay via elasticity analysis

    No full text
    The transitory and long-term elasticities of the Bay of Biscay ecosystem to densityindependent and density-dependent influences were estimated within a state space model that accounted for both process and observation uncertainties. A functional group based model for the Bay of Biscay fish ecosystem was fit to time series obtained from scientific survey and commercial catch and effort data. The observation model parameters correspond to the unknown catchabilities and observation error variances that vary across the commercial fisheries and fishery-independent scientific surveys. The process model used a Gompertz form of density dependence, which is commonly used for the analysis of multivariate ecological time series, with unknown time-varying fishing mortalities. Elasticity analysis showed that the process model parameters are directly interpretable in terms of one-year look-ahead prediction elasticities, which measure the proportional response of a functional group in the next year given a proportional change to a variable or parameter in the current year. The density dependent parameters were also shown to define the elasticities of the long term means or quantiles of the functional groups to changes in fishing pressure. Evidence for the importance of indirect effects, mediated by density dependence, in determining the ecosystem response of the Bay of Biscay to changes in fishing pressure is presented. The state space model performed favourably in an assessment of model adequacy that compared observations of catch per unit effort against cross-validation predictive densities blocked by year

    Gaussian processs framework for temporal dependence and discrepancy functions in Ricker-type population growth models

    Get PDF
    Density dependent population growth functions are of central importance to population dynamics modelling because they describe the theoretical rate of recruitment of new individuals to a natural population. Traditionally these functions are described with a fixed functional form with temporally constant parameters and without species interactions. The Ricker stock-recruitment model is one such function that is commonly used in fisheries stock assessment. In recent years, there has been increasing interest in semi-parametric and temporally varying population growth models. The former are related to the general statistical approach of using semi-parametric discrepancy functions, such as Gaussian processes (GP), to model deviations of data around the expected parametric function. In the latter, the reproductive rate, which is a key parameter describing the population growth rate, is assumed to vary in time. In this work, we introduce how these existing Ricker population growth models can be formulated under the same statistical approach of hierarchical GP models. We also show how the time invariant semi-parametric approach can be extended and combined with the time varying reproductive rate using a GP model. Then we extend these models to the multispecies setting by incorporating cross-covariances among species with a continuous time covariance structure using the linear model of coregionalization. As a case study, we examine the productivity of three Pacific salmon populations. We compare the alternative Ricker population growth functions using model posterior probabilities and leave-one-out cross validation predictive densities. Our results show substantial temporal variation in maximum reproductive rates and reveal temporal dependence among the species, which have direct management implications. However, our results do not support inclusion of semi-parametric discrepancy function and they suggest that the semi-parametric discrepancy functions may lead to challenges in parameter identifiability more generally.Peer reviewe

    Making inference from wildlife collision data: inferring predator absence from prey strikes

    No full text
    Wildlife collision data are ubiquitous, though challenging for making ecological inference due to typically irreducible uncertainty relating to the sampling process. We illustrate a new approach that is useful for generating inference from predator data arising from wildlife collisions. By simply conditioning on a second prey species sampled via the same collision process, and by using a biologically realistic numerical response functions, we can produce a coherent numerical response relationship between predator and prey. This relationship can then be used to make inference on the population size of the predator species, including the probability of extinction. The statistical conditioning enables us to account for unmeasured variation in factors influencing the runway strike incidence for individual airports and to enable valid comparisons. A practical application of the approach for testing hypotheses about the distribution and abundance of a predator species is illustrated using the hypothesized red fox incursion into Tasmania, Australia. We estimate that conditional on the numerical response between fox and lagomorph runway strikes on mainland Australia, the predictive probability of observing no runway strikes of foxes in Tasmania after observing 15 lagomorph strikes is 0.001. We conclude there is enough evidence to safely reject the null hypothesis that there is a widespread red fox population in Tasmania at a population density consistent with prey availability. The method is novel and has potential wider application

    The relative importance of environmental stochasticity, interspecific interactions, and observation error: Insights from sardine and anchovy landings

    No full text
    Long-term time series of sardine and anchovy landings often suggest negative dependence between these species, and an array of mechanisms have been proposed as explanations. We reduce these propositions to four basic hypotheses of (1) independence, (2) correlated process noise, (3) interspecific interactions, and (4) correlated observational error. We use a Bayesian approach to develop priors for parsimonious state space models with both process noise and observation error that represent each of these hypotheses, and apply this approach to five long-term time series of landings collected from the Pacific and Atlantic Oceans. Model comparison criteria suggest that the hypothesis of correlated process noise has the broadest support, where the temporal dependence of anchovy and sardines may be caused in part by either direct environmental influence on their physiology, or indirect bottom-up effects on their prey. However, all hypotheses find some degree of support within the five time series, and in general, the sardine and anchovy landings suggest weak intraspecific density dependence and susceptibility to both environmental and anthropogenic perturbation. Results additionally suggest that the best fitting hypothesis depends on the choice of geographic scale, temporal scale, and stock definition of the recorded landings
    corecore