18,665 research outputs found

    Performance of the Beta-Binomial Model for Clustered Binary Responses: Comparison with Generalized Estimating Equations

    Get PDF
    This study examined performance of the beta-binomial model in comparison with GEE using clustered binary responses resulting in non-normal outcomes. Monte Carlo simulations were performed under varying intracluster correlations and sample sizes. The results showed that the beta-binomial model performed better for small sample, while GEE performed well under large sample

    Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Growing interest on biological pathways has called for new statistical methods for modeling and testing a genetic pathway effect on a health outcome. The fact that genes within a pathway tend to interact with each other and relate to the outcome in a complicated way makes nonparametric methods more desirable. The kernel machine method provides a convenient, powerful and unified method for multi-dimensional parametric and nonparametric modeling of the pathway effect.</p> <p>Results</p> <p>In this paper we propose a logistic kernel machine regression model for binary outcomes. This model relates the disease risk to covariates parametrically, and to genes within a genetic pathway parametrically or nonparametrically using kernel machines. The nonparametric genetic pathway effect allows for possible interactions among the genes within the same pathway and a complicated relationship of the genetic pathway and the outcome. We show that kernel machine estimation of the model components can be formulated using a logistic mixed model. Estimation hence can proceed within a mixed model framework using standard statistical software. A score test based on a Gaussian process approximation is developed to test for the genetic pathway effect. The methods are illustrated using a prostate cancer data set and evaluated using simulations. An extension to continuous and discrete outcomes using generalized kernel machine models and its connection with generalized linear mixed models is discussed.</p> <p>Conclusion</p> <p>Logistic kernel machine regression and its extension generalized kernel machine regression provide a novel and flexible statistical tool for modeling pathway effects on discrete and continuous outcomes. Their close connection to mixed models and attractive performance make them have promising wide applications in bioinformatics and other biomedical areas.</p

    mpower: An R Package for Power Analysis via Simulation for Correlated Data

    Full text link
    Estimating sample size and statistical power is an essential part of a good study design. This R package allows users to conduct power analysis based on Monte Carlo simulations in settings in which consideration of the correlations between predictors is important. It runs power analyses given a data generative model and an inference model. It can set up a data generative model that preserves dependence structures among variables given existing data (continuous, binary, or ordinal) or high-level descriptions of the associations. Users can generate power curves to assess the trade-offs between sample size, effect size, and power of a design. This paper presents tutorials and examples focusing on applications for environmental mixture studies when predictors tend to be moderately to highly correlated. It easily interfaces with several existing and newly developed analysis strategies for assessing associations between exposures and health outcomes. However, the package is sufficiently general to facilitate power simulations in a wide variety of settings

    Modelling correlated data : multilevel models and generalized estimating equations and their use with data from research in developmental disabilities

    Get PDF
    Background: The use of Multilevel Models (MLM) and Generalized Estimating Equations (GEE) for analysing clustered data in the field of intellectual and developmental disability (IDD) research is still limited. Method: We present some important features of MLMs and GEEs: main function, assumptions, model specification and estimators, sample size and power. We provide an overview of the ways MLMs and GEEs have been used in IDD research. Results: While MLMs and GEEs are both appropriate for longitudinal and/or clustered data, they differ in the assumptions they impose on the data, and the inferences made. Estimators in MLMs require appropriate model specification, while GEEs are more resilient to misspecification at the expense of model complexity. Studies on sample size seem to suggest that Level 1 coefficients are robust to small samples/clusters, with any higher-level coefficients less so. MLMs have been used more frequently than GEEs in IDD research, especially for fitting developmental trajectories. Conclusions: Clustered data from research in the IDD field can be analysed flexibly using MLMs and GEEs. These models would be more widely used if journals required the inclusion of technical specification detail, simulation studies examined power for IDD study characteristics, and researchers developed core skills during basic studies

    The batched stepped wedge design: A design robust to delays in cluster recruitment

    Get PDF
    Stepped wedge designs are an increasingly popular variant of longitudinal cluster randomized trial designs, and roll out interventions across clusters in a randomized, but step‐wise fashion. In the standard stepped wedge design, assumptions regarding the effect of time on outcomes may require that all clusters start and end trial participation at the same time. This would require ethics approvals and data collection procedures to be in place in all clusters before a stepped wedge trial can start in any cluster. Hence, although stepped wedge designs are useful for testing the impacts of many cluster‐based interventions on outcomes, there can be lengthy delays before a trial can commence. In this article, we introduce “batched” stepped wedge designs. Batched stepped wedge designs allow clusters to commence the study in batches, instead of all at once, allowing for staggered cluster recruitment. Like the stepped wedge, the batched stepped wedge rolls out the intervention to all clusters in a randomized and step‐wise fashion: a series of self‐contained stepped wedge designs. Provided that separate period effects are included for each batch, software for standard stepped wedge sample size calculations can be used. With this time parameterization, in many situations including when linear models are assumed, sample size calculations reduce to the setting of a single stepped wedge design with multiple clusters per sequence. In these situations, sample size calculations will not depend on the delays between the commencement of batches. Hence, the power of batched stepped wedge designs is robust to unexpected delays between batches

    Statistical Analysis of Correlated Ordinal Data: Application to Cluster Randomization Trials

    Get PDF
    Cluster randomization trials have become increasingly popular when theoretical, ethical or practical considerations preclude the use of traditional trials that randomize individual subjects. Although some methods for analyzing clustered ordinal data have been brought to wide attention, these are less developed as compared to methods for analyzing clustered continuous or binary outcome data. The aim of this thesis is to refine existing strategies which may be applicable to clustered ordinal data as well as extensions which have been previously considered only for clustered binary responses. The approaches include adjusted Cochran-Armitage tests using an ICC estimator, and correction and modification strategies to improve the small-sample performance of the Wald test and score test in GEE for clustered ordinal data. The type I error and power for these test statistics are investigated using a simulation study. Simulation results show that kappa-type estimators had less bias than ICC estimators when cluster sizes were fixed and small for ρ = 0.005 or ρ = 0.01. Conversely, ANOVA ICCs had relatively smaller bias in the case of variable cluster sizes. In addition, small-sample performance of GEE robust Wald tests are improved by using adjustments and corrections. The adjusted test WBC1 is recommended in terms of type I error and power. The discussion is illustrated using data from a school-based cluster randomization trial

    Techniques for handling clustered binary data

    Get PDF
    Bibliography : leaves 143-153.Over the past few decades there has been increasing interest in clustered studies and hence much research has gone into the analysis of data arising from these studies. It is erroneous to treat clustered data, where observations within a cluster are correlated with each other, as one would treat independent data. It has been found that point estimates are not as greatly affected by clustering as are the standard deviations of the estimates. But as a consequence, confidence intervals and hypothesis testing are severely affected. Therefore one has to approach the analysis of clustered data with caution. Methods that specifically deal with correlated data have been developed. Analysis may be further complicated when the outcome variable of interest is binary rather than continuous. Methods for estimation of proportions, their variances, calculation of confidence intervals and a variety of techniques for testing the homogeneity of proportions have been developed over the years (Donner and Klar, 1993; Donner, 1989, and Rao and Scott, 1992). The methods developed within the context of experimental design generally involve incorporating the effect of clustering in the analysis. This cluster effect is quantified by the intracluster correlation and needs to be taken into account when estimating proportions, comparing proportions and in sample size calculations. In the context of observational studies, the effect of clustering is expressed by the design effect which is the inflation in the variance of an estimate that is due to selecting a cluster sample rather than an independent sample. Another important aspect of the analysis of complex sample data that is often neglected is sampling weights. One needs to recognise that each individual may not have the same probability of being selected. These weights adjust for this fact (Little et al, 1997). Methods for modelling correlated binary data have also been discussed quite extensively. Among the many models which have been proposed for analyzing binary clustered data are two approaches which have been studied and compared: the population-averaged and cluster-specific approach. The population-averaged model focuses on estimating the effect of a set of covariates on the marginal expectation of the response. One example of the population-averaged approach for parameter estimation is known as generalized estimating equations, proposed by Liang and Zeger (1986). It involves assuming that elements within a cluster are independent and then imposing a correlation structure on the set of responses. This is a useful application in longitudinal studies where a subject is regarded as a cluster. Then the parameters describe how the population-averaged response rather than a specific subject's response depends on the covariates of interest. On the other hand, cluster specific models introduce cluster to cluster variability in the model by including random effects terms, which are specific to the cluster, as linear predictors in the regression model (Neuhaus et al, 1991). Unlike the special case of correlated Gaussian responses, the parameters for the cluster specific model obtained for binary data describe different effects on the responses compared to that obtained from the population-averaged model. For longitudinal data, the parameters of a cluster-specific model describe how a specific individuals probability of a response depends on the covariates. The decision to use either of these modelling methods depends on the questions of interest. Cluster-specific models are useful for studying the effects of cluster-varying covariates and when an individual's response rather than an average population's response is the focus. The population-averaged model is useful when interest lies in how the average response across clusters changes with covariates. A criticism of this approach is that there may be no individual with the characteristics of the population-averaged model
    corecore