1,156 research outputs found
Selection of proposal distributions for generalized importance sampling estimators
The standard importance sampling (IS) estimator, generally does not work well
in examples involving simultaneous inference on several targets as the
importance weights can take arbitrarily large values making the estimator
highly unstable. In such situations, alternative generalized IS estimators
involving samples from multiple proposal distributions are preferred. Just like
the standard IS, the success of these multiple IS estimators crucially depends
on the choice of the proposal distributions. The selection of these proposal
distributions is the focus of this article. We propose three methods based on
(i) a geometric space filling coverage criterion, (ii) a minimax variance
approach, and (iii) a maximum entropy approach. The first two methods are
applicable to any multi-proposal IS estimator, whereas the third approach is
described in the context of Doss's (2010) two-stage IS estimator. For the first
method we propose a suitable measure of coverage based on the symmetric
Kullback-Leibler divergence, while the second and third approaches use
estimates of asymptotic variances of Doss's (2010) IS estimator and Geyer's
(1994) reverse logistic estimator, respectively. Thus, we provide consistent
spectral variance estimators for these asymptotic variances. The proposed
methods for selecting proposal densities are illustrated using various detailed
examples
Sequential Empirical Bayes method for filtering dynamic spatiotemporal processes
We consider online prediction of a latent dynamic spatiotemporal process and
estimation of the associated model parameters based on noisy data. The problem
is motivated by the analysis of spatial data arriving in real-time and the
current parameter estimates and predictions are updated using the new data at a
fixed computational cost. Estimation and prediction is performed within an
empirical Bayes framework with the aid of Markov chain Monte Carlo samples.
Samples for the latent spatial field are generated using a sampling importance
resampling algorithm with a skewed-normal proposal and for the temporal
parameters using Gibbs sampling with their full conditionals written in terms
of sufficient quantities which are updated online. The spatial range parameter
is estimated by a novel online implementation of an empirical Bayes method,
called herein sequential empirical Bayes method. A simulation study shows that
our method gives similar results as an offline Bayesian method. We also find
that the skewed-normal proposal improves over the traditional Gaussian
proposal. The application of our method is demonstrated for online monitoring
of radiation after the Fukushima nuclear accident
Estimation and prediction for spatial generalized linear mixed models with parametric links via reparameterized importance sampling
Spatial generalized linear mixed models (SGLMMs) are popular for analyzing
non-Gaussian spatial data. These models assume a prescribed link function that
relates the underlying spatial field with the mean response. There are
circumstances, such as when the data contain outlying observations, where the
use of a prescribed link function can result in poor fit, which can be improved
by using a parametric link function. Some popular link functions, such as the
Box-Cox, are unsuitable because they are inconsistent with the Gaussian
assumption of the spatial field. We present sensible choices of parametric link
functions which possess desirable properties. It is important to estimate the
parameters of the link function, rather than assume a known value. To that end,
we present a generalized importance sampling (GIS) estimator based on multiple
Markov chains for empirical Bayes analysis of SGLMMs. The GIS estimator,
although more efficient than the simple importance sampling, can be highly
variable when used to estimate the parameters of certain link functions. Using
suitable reparameterizations of the Monte Carlo samples, we propose modified
GIS estimators that do not suffer from high variability. We use Laplace
approximation for choosing the multiple importance densities in the GIS
estimator. Finally, we develop a methodology for selecting models with
appropriate link function family, which extends to choosing a spatial
correlation function as well. We present an ensemble prediction of the mean
response by appropriately weighting the estimates from different models. The
proposed methodology is illustrated using simulated and real data examples
Bayesian and Frequentist Methods for Approximate Inference in Generalized Linear Mixed Models
Closed form expressions for the likelihood and the predictive density under the Generalized Linear Mixed Model setting are often nonexistent due to the fact that they involve integration of a nonlinear function over a high-dimensional space. We derive approximations to those quantities useful for obtaining results connected with the estimation and prediction from a Bayesian as well as from a frequentist point of view. Our asymptotic approximations work under the assumption that the sample size becomes large with a higher rate than the number of random effects. The first part of the thesis presents results related to frequentist methodology. We derive an approximation to the log-likelihood of the parameters which, if maximized, gives estimates with low mean square error compared to other methods. Similar techniques are used for the prediction of the random effects where we propose an approximate predictive density from the Gaussian family of densities. Our simulations show that the predictions obtained using our method is comparable to other computationally intensive methods. Focus is given toward the analysis of spatial data where, as an example, the analysis of the rhizoctonia root rot data is presented. The second part of the thesis is concerned with the Bayesian prediction of the random effects. First, an approximation to the Bayesian predictive distribution function is derived which can be used to obtain prediction intervals for the random effects without the use of Monte Carlo methods. In addition, given a prior for the covariance parameters of the random effects we derive approximations to the coverage probability bias and the Kullbak-Leibler divergence of the predictive distribution constructed using that prior. A simulation study is performed where we compute these quantities for different priors to select the prior with the smallest coverage probability bias and Kullbak-Leibler divergence
Meta-Analysis in Genome-Wide Association Datasets: Strategies and Application in Parkinson Disease
BACKGROUND: Genome-wide association studies hold substantial promise for identifying common genetic variants that regulate susceptibility to complex diseases. However, for the detection of small genetic effects, single studies may be underpowered. Power may be improved by combining genome-wide datasets with meta-analytic techniques. METHODOLOGY/PRINCIPAL FINDINGS: Both single and two-stage genome-wide data may be combined and there are several possible strategies. In the two-stage framework, we considered the options of (1) enhancement of replication data and (2) enhancement of first-stage data, and then, we also considered (3) joint meta-analyses including all first-stage and second-stage data. These strategies were examined empirically using data from two genome-wide association studies (three datasets) on Parkinson disease. In the three strategies, we derived 12, 5, and 49 single nucleotide polymorphisms that show significant associations at conventional levels of statistical significance. None of these remained significant after conservative adjustment for the number of performed analyses in each strategy. However, some may warrant further consideration: 6 SNPs were identified with at least 2 of the 3 strategies and 3 SNPs [rs1000291 on chromosome 3, rs2241743 on chromosome 4 and rs3018626 on chromosome 11] were identified with all 3 strategies and had no or minimal between-dataset heterogeneity (I(2) = 0, 0 and 15%, respectively). Analyses were primarily limited by the suboptimal overlap of tested polymorphisms across different datasets (e.g., only 31,192 shared polymorphisms between the two tier 1 datasets). CONCLUSIONS/SIGNIFICANCE: Meta-analysis may be used to improve the power and examine the between-dataset heterogeneity of genome-wide association studies. Prospective designs may be most efficient, if they try to maximize the overlap of genotyping platforms and anticipate the combination of data across many genome-wide association studies
The value of information for correlated GLMs
We examine the situation where a decision maker is considering investing in a number of projects with uncertain revenues. Before making a decision, the investor has the option to purchase data which carry information about the outcomes from pertinent projects. When these projects are correlated, the data are informative about all the projects. The value of information is the maximum amount the investor would pay to acquire these data.
The problem can be seen from a sampling design perspective where the sampling criterion is the maximisation of the value of information minus the sampling cost. The examples we have in mind are in the spatial setting where the sampling is performed at spatial coordinates or spatial regions.
In this paper we discuss the case where the outcome of each project is modelled by a generalised linear mixed model. When the distribution is non-Gaussian, the value of information does not have a closed form expression. We use the Laplace approximation and matrix approximations to derive an analytical expression to the value of information, and examine its sensitivity under different parameter settings and distributions. In the Gaussian case the proposed technique is exact. Our analytical method is compared against the alternative Monte-Carlo method, and we show similarity of results for various sample sizes of the data. The closed form results are much faster to compute. Model weighting and bootstrap are used to measure the sensitivity of our analysis to model and parameter uncertainty. A general guidance on making decisions using our results is offered.
Application of the method is presented in a spatial decision problem for treating the Bovine Tuberculosis in the United Kingdom, and for rock fall avoidance decisions in a Norwegian mine
Optimal predictive design augmentation for spatial generalised linear mixed models
A typical model for geostatistical data when the observations are counts is the spatial generalised linear mixed model. We present a criterion for optimal sampling design under this framework which aims to minimise the error in the prediction of the underlying spatial random effects. The proposed criterion is derived by performing an asymptotic expansion to the conditional prediction variance. We argue that the mean of the spatial process needs to be taken into account in the construction of the predictive design, which we demonstrate through a simulation study where we compare the proposed criterion against the widely used space-filling design. Furthermore, our results are applied to the Norway precipitation data and the rhizoctonia disease data
Family-Based versus Unrelated Case-Control Designs for Genetic Associations
The most simple and commonly used approach for genetic associations is the case-control study design of unrelated people. This design is susceptible to population stratification. This problem is obviated in family-based studies, but it is usually difficult to accumulate large enough samples of well-characterized families. We addressed empirically whether the two designs give similar estimates of association in 93 investigations where both unrelated case-control and family-based designs had been employed. Estimated odds ratios differed beyond chance between the two designs in only four instances (4%). The summary relative odds ratio (ROR) (the ratio of odds ratios obtained from unrelated case-control and family-based studies) was close to unity (0.96 [95% confidence interval, 0.91–1.01]). There was no heterogeneity in the ROR across studies (amount of heterogeneity beyond chance I(2) = 0%). Differences on whether results were nominally statistically significant (p < 0.05) or not with the two designs were common (opposite classification rates 14% and 17%); this reflected largely differences in power. Conclusions were largely similar in diverse subgroup analyses. Unrelated case-control and family-based designs give overall similar estimates of association. We cannot rule out rare large biases or common small biases
- …