90 research outputs found

    A comparison of analytic approaches for individual patient data meta-analyses with binary outcomes

    Get PDF
    Abstract Background Individual patient data meta-analyses (IPD-MA) are often performed using a one-stage approach-- a form of generalized linear mixed model (GLMM) for binary outcomes. We compare (i) one-stage to two-stage approaches (ii) the performance of two estimation procedures (Penalized Quasi-likelihood-PQL and Adaptive Gaussian Hermite Quadrature-AGHQ) for GLMMs with binary outcomes within the one-stage approach and (iii) using stratified study-effect or random study-effects. Methods We compare the different approaches via a simulation study, in terms of bias, mean-squared error (MSE), coverage and numerical convergence, of the pooled treatment effect (β 1) and between-study heterogeneity of the treatment effect (τ 1 2 ). We varied the prevalence of the outcome, sample size, number of studies and variances and correlation of the random effects. Results The two-stage and one-stage methods produced approximately unbiased β 1 estimates. PQL performed better than AGHQ for estimating τ 1 2 with respect to MSE, but performed comparably with AGHQ in estimating the bias of β 1 and of τ 1 2 . The random study-effects model outperformed the stratified study-effects model in small size MA. Conclusion The one-stage approach is recommended over the two-stage method for small size MA. There was no meaningful difference between the PQL and AGHQ procedures. Though the random-intercept and stratified-intercept approaches can suffer from their underlining assumptions, fitting GLMM with a random-intercept are less prone to misfit and has good convergence rate

    Meta-analysis of binary outcomes via generalized linear mixed models: a simulation study

    Get PDF
    Background: Systematic reviews and meta-analyses of binary outcomes are widespread in all areas of application. The odds ratio, in particular, is by far the most popular effect measure. However, the standard meta-analysis of odds ratios using a random-effects model has a number of potential problems. An attractive alternative approach for the meta-analysis of binary outcomes uses a class of generalized linear mixed models (GLMMs). GLMMs are believed to overcome the problems of the standard random-effects model because they use a correct binomial-normal likelihood. However, this belief is based on theoretical considerations, and no sufficient simulations have assessed the performance of GLMMs in meta-analysis. This gap may be due to the computational complexity of these models and the resulting considerable time requirements. Methods: The present study is the first to provide extensive simulations on the performance of four GLMM methods (models with fixed and random study effects and two conditional methods) for meta-analysis of odds ratios in comparison to the standard random effects model. Results: In our simulations, the hypergeometric-normal model provided less biased estimation of the heterogeneity variance than the standard random-effects meta-analysis using the restricted maximum likelihood (REML) estimation when the data were sparse, but the REML method performed similarly for the point estimation of the odds ratio, and better for the interval estimation. Conclusions: It is difficult to recommend the use of GLMMs in the practice of meta-analysis. The problem of finding uniformly good methods of the meta-analysis for binary outcomes is still open

    A Hybrid Bayesian Laplacian Approach for Generalized Linear Mixed Models

    Get PDF
    The analytical intractability of generalized linear mixed models (GLMMs) has generated a lot of research in the past two decades. Applied statisticians routinely face the frustrating prospect of widely disparate results produced by the methods that are currently implemented in commercially available software. This article is motivated by this frustration and develops guidance as well as new methods that are computationally efficient and statistically reliable. Two main classes of approximations have been developed: likelihood-based methods and Bayesian methods. Likelihood-based methods such as the penalized quasi-likelihood approach of Breslow and Clayton (1993) have been shown to produce biased estimates especially for binary clustered data with small clusters sizes. More recent methods such as the adaptive Gaussian quadrature approach perform well but can be overwhelmed by problems with large numbers of random effects, and efficient algorithms to better handle these situations have not yet been integrated in standard statistical packages. Similarly, Bayesian methods, though they have good frequentist properties when the model is correct, are known to be computationally intensive and also require specialized code, limiting their use in practice. In this article we build on our previous method (Capanu and Begg 2010) and propose a hybrid approach that provides a bridge between the likelihood-based and Bayesian approaches by employing Bayesian estimation for the variance compo- nents followed by Laplacian estimation for the regression coefficients with the goal of obtaining good statistical properties, with relatively good computing speed, and using widely available software. The hybrid approach is shown to perform well against the other competitors considered. Another impor- tant finding of this research is the surprisingly good performance of the Laplacian approximation in the difficult case of binary clustered data with small clusters sizes. We apply the methods to a real study of head and neck squamous cell carcinoma and illustrate their properties using simulations based on a widely-analyzed salamander mating dataset and on another important dataset involving the Guatemalan Child Health survey

    Computational Techniques for Spatial Logistic Regression with Large Datasets

    Get PDF
    In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation. A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being more efficient computationally than other Bayesian approaches. One of the contributions of this work is further development of this underused representation. The spectral basis model outperforms the penalized likelihood methods, which are prone to overfitting, but is slower to fit and not as easily implemented. Conclusions based on a real dataset of cancer cases in Taiwan are similar albeit less conclusive with respect to comparing the approaches. The success of the spectral basis with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models

    Development of spatial statistical methods for modelling point-referenced spatial data in malaria epidemiology

    Get PDF
    Plasmodium falciparum malaria is the world’s most important parasitic disease and a major cause of morbidity and mortality in Africa. However figures for the burden of malaria morbidity and mortality are very uncertain, since reliable maps of the distribution of malaria transmission and the numbers of affected individuals are not available for most of the African continent. Accurate statistics on the geographical distribution of different endemicities of malaria, on the populations at risk, and on the implications of given levels of endemicity for morbidity and mortality are important for effective malaria control programs. These estimates can be obtained using appropriate statistical models which relate infection, morbidity, and mortality rates to risk factors, measured at individual level, but also to factors that vary gradually over geographical locations. Statistical models which incorporate geographical or individual heterogeneity are complex and highly parameterized. Limitations in statistical computation have until recently made the implementation of these models impractical for non-normal response data, sampled at large numbers of geographical locations. Modern developments in Markov chain Monte Carlo (MCMC) inference have greatly advanced spatial modelling, however many methodological and theoretical problems still remain. For data collected over a fixed number of locations (point-referenced or geostatistical data) such as malaria morbidity and mortality data used in this study, spatial correlation is best specified by parameterizing the variance-covariance matrix of the outcome of interest in relation to the spatial configuration of the locations (variogram modelling). This has been considered infeasible for a large number of locations because of the repeated inversion of the variance-covariance matrix involved in the likelihood. In addition the spatial correlation in malariological data could be dependent not only on the distance between locations but on the locations themselves. Variogram models need to be further developed to take into account the above property which is known as non-stationarity. This thesis reports research with the objectives of: a) developing Bayesian hierarchical models for the analysis of point-referenced malaria prevalence, malaria transmission and mortality data via variogram modelling for a large number of locations taking into account non-stationarity and misalignment, while present in the data; b) producing country specific and continent-wide maps of malaria transmission and malaria prevalence in Africa, augmented by the use of climatic and environmental data; c) assessing the magnitude of the effects of malaria endemicity on infant and child mortality after adjusting of socio-economic factors and geographical patterns. A comparison of the MCMC and the Sampling-Importance-Resampling approach for Bayesian fitting of variogram models showed that the latter was no easier to implement, did not improve estimation accuracy and did not lead to computationally more efficient estimation. Different approaches were proposed to overcome the inversion of large covariance matrices. Numerical algorithms especially suited within the MCMC framework were implemented to convert large covariance matrices to sparse ones and to accelerate inversion. A tesselation-based model was developed which partition the space into random Voronoi tiles. The model assumes a separate spatial process in each tile and independence between tiles. Model fit was implemented via reversible jump MCMC which takes into account the varying number of parameters arised due to random number of tiles. This approach facilitates inversion by converting the covariance matrix to block diagonal form. In addition, this model is well suited for non-stationary data. An accelerated failure time model was developed for spatially misaligned data to assess malaria endemicity in relation to child mortality. The misalignment arised because the data were extracted from databases which were collected at a different set of locations. The newly developed statistical methodology was implemented to produce smooth maps of malaria transmission in Mali and West- and Central Africa, using malaria survey data from the Mapping Malaria Risk in Africa (MARA) database. The surveys were carried out at arbitrary locations and include non-standardized and overlapping age groups. To achieve comparability between different surveys, the Garki transmission model was applied to convert the heterogeneous age prevalence data to a common scale of a transmission intensity measure. A Bayesian variogram model was fitted to the transmission intensity estimates. The model adjusted for environmental predictors which were extracted from remote sensing. Bayesian kriging was used to obtain smooth maps of the transmission intensity, which were converted to age-specific maps of malaria risk. TheWest- and Central African map was based on a seasonality model we developed for the whole of Africa. Expert opinion suggests that the resulting maps improve previous mapping efforts. Additional surveys are needed to increase the precision of the predictions in zones were there are large disagreement with previous maps and data are sparse. The survival model for misaligned data was implemented to produce a smooth mortality map in Mali and assess the relation between malaria endemicity and child and infant mortality by linking the MARA database with the Demographic and Health Survey (DHS) database. The model was adjusted for socio-economic factors and spatial dependence. The analysis confirmed that mothers education, birth order and preceding birth interval, sex of infant, residence and mothers age at birth have a strong impact on infant and child mortality risk, but no statistically significant effect of P. falciparum prevalence could be demonstrated. This may reflect unmeasured local factors, for instance variations in health provisions or availability of water supply in the dry Sahel region, which could have a stronger influence than malaria risk on mortality patterns

    Modelling Spatio-Temporal Elephant Movement Data: a Generalized Additive Mixed Models Framework

    Get PDF
    This thesis focuses on understanding how environmental factors influence elephant movement and in investigating the spatio-temporal patterns. The thesis analyses movement data of some African elephants (Loxodonta Africana) living in the Kruger National Park and its associated private reserves of South Africa. Due to heterogeneity among elephants, and nonlinear relationships between elephant movement and environmental variables, Generalized Additive Mixed Models (GAMMs) were employed. Results showed delayed effects of rainfall and temperature and particular trends in time and space

    Variable Selection in Accelerated Failure Time (AFT) Frailty Models: An Application of Penalized Quasi-Likelihood

    Get PDF
    Variable selection is one of the standard ways of selecting models in large scale datasets. It has applications in many fields of research study, especially in large multi-center clinical trials. One of the prominent methods in variable selection is the penalized likelihood, which is both consistent and efficient. However, the penalized selection is significantly challenging under the influence of random (frailty) covariates. It is even more complicated when there is involvement of censoring as it may not have a closed-form solution for the marginal log-likelihood. Therefore, we applied the penalized quasi-likelihood (PQL) approach that approximates the solution for such a likelihood. In addition, we introduce an adaptive penalty function that makes the selection on both fixed and frailty effects in a left-censored dataset for a parametric AFT frailty model. We also compared our penalty function with other established procedures via their performance on accurately choosing the significant coefficients and shrinking the non-significant coefficients to zero
    corecore