1,156 research outputs found

    Selection of proposal distributions for generalized importance sampling estimators

    Get PDF
    The standard importance sampling (IS) estimator, generally does not work well in examples involving simultaneous inference on several targets as the importance weights can take arbitrarily large values making the estimator highly unstable. In such situations, alternative generalized IS estimators involving samples from multiple proposal distributions are preferred. Just like the standard IS, the success of these multiple IS estimators crucially depends on the choice of the proposal distributions. The selection of these proposal distributions is the focus of this article. We propose three methods based on (i) a geometric space filling coverage criterion, (ii) a minimax variance approach, and (iii) a maximum entropy approach. The first two methods are applicable to any multi-proposal IS estimator, whereas the third approach is described in the context of Doss's (2010) two-stage IS estimator. For the first method we propose a suitable measure of coverage based on the symmetric Kullback-Leibler divergence, while the second and third approaches use estimates of asymptotic variances of Doss's (2010) IS estimator and Geyer's (1994) reverse logistic estimator, respectively. Thus, we provide consistent spectral variance estimators for these asymptotic variances. The proposed methods for selecting proposal densities are illustrated using various detailed examples

    Sequential Empirical Bayes method for filtering dynamic spatiotemporal processes

    Get PDF
    We consider online prediction of a latent dynamic spatiotemporal process and estimation of the associated model parameters based on noisy data. The problem is motivated by the analysis of spatial data arriving in real-time and the current parameter estimates and predictions are updated using the new data at a fixed computational cost. Estimation and prediction is performed within an empirical Bayes framework with the aid of Markov chain Monte Carlo samples. Samples for the latent spatial field are generated using a sampling importance resampling algorithm with a skewed-normal proposal and for the temporal parameters using Gibbs sampling with their full conditionals written in terms of sufficient quantities which are updated online. The spatial range parameter is estimated by a novel online implementation of an empirical Bayes method, called herein sequential empirical Bayes method. A simulation study shows that our method gives similar results as an offline Bayesian method. We also find that the skewed-normal proposal improves over the traditional Gaussian proposal. The application of our method is demonstrated for online monitoring of radiation after the Fukushima nuclear accident

    Estimation and prediction for spatial generalized linear mixed models with parametric links via reparameterized importance sampling

    Get PDF
    Spatial generalized linear mixed models (SGLMMs) are popular for analyzing non-Gaussian spatial data. These models assume a prescribed link function that relates the underlying spatial field with the mean response. There are circumstances, such as when the data contain outlying observations, where the use of a prescribed link function can result in poor fit, which can be improved by using a parametric link function. Some popular link functions, such as the Box-Cox, are unsuitable because they are inconsistent with the Gaussian assumption of the spatial field. We present sensible choices of parametric link functions which possess desirable properties. It is important to estimate the parameters of the link function, rather than assume a known value. To that end, we present a generalized importance sampling (GIS) estimator based on multiple Markov chains for empirical Bayes analysis of SGLMMs. The GIS estimator, although more efficient than the simple importance sampling, can be highly variable when used to estimate the parameters of certain link functions. Using suitable reparameterizations of the Monte Carlo samples, we propose modified GIS estimators that do not suffer from high variability. We use Laplace approximation for choosing the multiple importance densities in the GIS estimator. Finally, we develop a methodology for selecting models with appropriate link function family, which extends to choosing a spatial correlation function as well. We present an ensemble prediction of the mean response by appropriately weighting the estimates from different models. The proposed methodology is illustrated using simulated and real data examples

    Approximate Bayesian Inference for Geostatistical Generalised Linear Models

    Get PDF

    Bayesian and Frequentist Methods for Approximate Inference in Generalized Linear Mixed Models

    Get PDF
    Closed form expressions for the likelihood and the predictive density under the Generalized Linear Mixed Model setting are often nonexistent due to the fact that they involve integration of a nonlinear function over a high-dimensional space. We derive approximations to those quantities useful for obtaining results connected with the estimation and prediction from a Bayesian as well as from a frequentist point of view. Our asymptotic approximations work under the assumption that the sample size becomes large with a higher rate than the number of random effects. The first part of the thesis presents results related to frequentist methodology. We derive an approximation to the log-likelihood of the parameters which, if maximized, gives estimates with low mean square error compared to other methods. Similar techniques are used for the prediction of the random effects where we propose an approximate predictive density from the Gaussian family of densities. Our simulations show that the predictions obtained using our method is comparable to other computationally intensive methods. Focus is given toward the analysis of spatial data where, as an example, the analysis of the rhizoctonia root rot data is presented. The second part of the thesis is concerned with the Bayesian prediction of the random effects. First, an approximation to the Bayesian predictive distribution function is derived which can be used to obtain prediction intervals for the random effects without the use of Monte Carlo methods. In addition, given a prior for the covariance parameters of the random effects we derive approximations to the coverage probability bias and the Kullbak-Leibler divergence of the predictive distribution constructed using that prior. A simulation study is performed where we compute these quantities for different priors to select the prior with the smallest coverage probability bias and Kullbak-Leibler divergence

    Meta-Analysis in Genome-Wide Association Datasets: Strategies and Application in Parkinson Disease

    Get PDF
    BACKGROUND: Genome-wide association studies hold substantial promise for identifying common genetic variants that regulate susceptibility to complex diseases. However, for the detection of small genetic effects, single studies may be underpowered. Power may be improved by combining genome-wide datasets with meta-analytic techniques. METHODOLOGY/PRINCIPAL FINDINGS: Both single and two-stage genome-wide data may be combined and there are several possible strategies. In the two-stage framework, we considered the options of (1) enhancement of replication data and (2) enhancement of first-stage data, and then, we also considered (3) joint meta-analyses including all first-stage and second-stage data. These strategies were examined empirically using data from two genome-wide association studies (three datasets) on Parkinson disease. In the three strategies, we derived 12, 5, and 49 single nucleotide polymorphisms that show significant associations at conventional levels of statistical significance. None of these remained significant after conservative adjustment for the number of performed analyses in each strategy. However, some may warrant further consideration: 6 SNPs were identified with at least 2 of the 3 strategies and 3 SNPs [rs1000291 on chromosome 3, rs2241743 on chromosome 4 and rs3018626 on chromosome 11] were identified with all 3 strategies and had no or minimal between-dataset heterogeneity (I(2) = 0, 0 and 15%, respectively). Analyses were primarily limited by the suboptimal overlap of tested polymorphisms across different datasets (e.g., only 31,192 shared polymorphisms between the two tier 1 datasets). CONCLUSIONS/SIGNIFICANCE: Meta-analysis may be used to improve the power and examine the between-dataset heterogeneity of genome-wide association studies. Prospective designs may be most efficient, if they try to maximize the overlap of genotyping platforms and anticipate the combination of data across many genome-wide association studies

    The value of information for correlated GLMs

    Get PDF
    We examine the situation where a decision maker is considering investing in a number of projects with uncertain revenues. Before making a decision, the investor has the option to purchase data which carry information about the outcomes from pertinent projects. When these projects are correlated, the data are informative about all the projects. The value of information is the maximum amount the investor would pay to acquire these data. The problem can be seen from a sampling design perspective where the sampling criterion is the maximisation of the value of information minus the sampling cost. The examples we have in mind are in the spatial setting where the sampling is performed at spatial coordinates or spatial regions. In this paper we discuss the case where the outcome of each project is modelled by a generalised linear mixed model. When the distribution is non-Gaussian, the value of information does not have a closed form expression. We use the Laplace approximation and matrix approximations to derive an analytical expression to the value of information, and examine its sensitivity under different parameter settings and distributions. In the Gaussian case the proposed technique is exact. Our analytical method is compared against the alternative Monte-Carlo method, and we show similarity of results for various sample sizes of the data. The closed form results are much faster to compute. Model weighting and bootstrap are used to measure the sensitivity of our analysis to model and parameter uncertainty. A general guidance on making decisions using our results is offered. Application of the method is presented in a spatial decision problem for treating the Bovine Tuberculosis in the United Kingdom, and for rock fall avoidance decisions in a Norwegian mine

    Optimal predictive design augmentation for spatial generalised linear mixed models

    Get PDF
    A typical model for geostatistical data when the observations are counts is the spatial generalised linear mixed model. We present a criterion for optimal sampling design under this framework which aims to minimise the error in the prediction of the underlying spatial random effects. The proposed criterion is derived by performing an asymptotic expansion to the conditional prediction variance. We argue that the mean of the spatial process needs to be taken into account in the construction of the predictive design, which we demonstrate through a simulation study where we compare the proposed criterion against the widely used space-filling design. Furthermore, our results are applied to the Norway precipitation data and the rhizoctonia disease data

    Family-Based versus Unrelated Case-Control Designs for Genetic Associations

    Get PDF
    The most simple and commonly used approach for genetic associations is the case-control study design of unrelated people. This design is susceptible to population stratification. This problem is obviated in family-based studies, but it is usually difficult to accumulate large enough samples of well-characterized families. We addressed empirically whether the two designs give similar estimates of association in 93 investigations where both unrelated case-control and family-based designs had been employed. Estimated odds ratios differed beyond chance between the two designs in only four instances (4%). The summary relative odds ratio (ROR) (the ratio of odds ratios obtained from unrelated case-control and family-based studies) was close to unity (0.96 [95% confidence interval, 0.91–1.01]). There was no heterogeneity in the ROR across studies (amount of heterogeneity beyond chance I(2) = 0%). Differences on whether results were nominally statistically significant (p < 0.05) or not with the two designs were common (opposite classification rates 14% and 17%); this reflected largely differences in power. Conclusions were largely similar in diverse subgroup analyses. Unrelated case-control and family-based designs give overall similar estimates of association. We cannot rule out rare large biases or common small biases
    corecore