410 research outputs found

    Formal and Informal Model Selection with Incomplete Data

    Full text link
    Model selection and assessment with incomplete data pose challenges in addition to the ones encountered with complete data. There are two main reasons for this. First, many models describe characteristics of the complete data, in spite of the fact that only an incomplete subset is observed. Direct comparison between model and data is then less than straightforward. Second, many commonly used models are more sensitive to assumptions than in the complete-data situation and some of their properties vanish when they are fitted to incomplete, unbalanced data. These and other issues are brought forward using two key examples, one of a continuous and one of a categorical nature. We argue that model assessment ought to consist of two parts: (i) assessment of a model's fit to the observed data and (ii) assessment of the sensitivity of inferences to unverifiable assumptions, that is, to how a model described the unobserved data given the observed ones.Comment: Published in at http://dx.doi.org/10.1214/07-STS253 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Discussion of Likelihood Inference for Models with Unobservables: Another View

    Full text link
    Discussion of "Likelihood Inference for Models with Unobservables: Another View" by Youngjo Lee and John A. Nelder [arXiv:1010.0303]Comment: Published in at http://dx.doi.org/10.1214/09-STS277A the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A goodness-of-fit test for the random-effects distribution in mixed models

    Get PDF
    In this paper, we develop a simple diagnostic test for the random-effects distribution in mixed models. The test is based on the gradient function, a graphical tool proposed by Verbeke and Molenberghs to check the impact of assumptions about the random-effects distribution in mixed models on inferences. Inference is conducted through the bootstrap. The proposed test is easy to implement and applicable in a general class of mixed models. The operating characteristics of the test are evaluated in a simulation study, and the method is further illustrated using two real data analyses

    A combined beta and normal random-effects model for repeated, overdispersed binary and binomial data

    Get PDF
    AbstractNon-Gaussian outcomes are often modeled using members of the so-called exponential family. Notorious members are the Bernoulli model for binary data, leading to logistic regression, and the Poisson model for count data, leading to Poisson regression. Two of the main reasons for extending this family are (1) the occurrence of overdispersion, meaning that the variability in the data is not adequately described by the models, which often exhibit a prescribed mean-variance link, and (2) the accommodation of hierarchical structure in the data, stemming from clustering in the data which, in turn, may result from repeatedly measuring the outcome, for various members of the same family, etc. The first issue is dealt with through a variety of overdispersion models, such as, for example, the beta-binomial model for grouped binary data and the negative-binomial model for counts. Clustering is often accommodated through the inclusion of random subject-specific effects. Though not always, one conventionally assumes such random effects to be normally distributed. While both of these phenomena may occur simultaneously, models combining them are uncommon. This paper starts from the broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects. We place particular emphasis on so-called conjugate random effects at the level of the mean for the first aspect and normal random effects embedded within the linear predictor for the second aspect, even though our family is more general. The binary and binomial cases are our focus. Apart from model formulation, we present an overview of estimation methods, and then settle for maximum likelihood estimation with analytic-numerical integration. The methodology is applied to two datasets of which the outcomes are binary and binomial, respectively

    beadarrayFilter : an R package to filter beads

    Get PDF
    Microarrays enable the expression levels of thousands of genes to be measured simultaneously. However, only a small fraction of these genes are expected to be expressed under different experimental conditions. Nowadays, filtering has been introduced as a step in the microarray preprocessing pipeline. Gene filtering aims at reducing the dimensionality of data by filtering redundant features prior to the actual statistical analysis. Previous filtering methods focus on the Affymetrix platform and can not be easily ported to the Illumina platform. As such, we developed a filtering method for Illumina bead arrays. We developed an R package, beadarrayFilter, to implement the latter method. In this paper, the main functions in the package are highlighted and using many examples, we illustrate how beadarrayFilter can be used to filter bead arrays

    Hierarchical models with normal and conjugate random effects : a review

    Get PDF
    Molenberghs, Verbeke, and Demétrio (2007) and Molenberghs et al. (2010) proposed a general framework to model hierarchical data subject to within-unit correlation and/or overdispersion. The framework extends classical overdispersion models as well as generalized linear mixed models. Subsequent work has examined various aspects that lead to the formulation of several extensions. A unified treatment of the model framework and key extensions is provided. Particular extensions discussed are: explicit calculation of correlation and other moment-based functions, joint modelling of several hierarchical sequences, versions with direct marginally interpretable parameters, zero-inflation in the count case, and influence diagnostics. The basic models and several extensions are illustrated using a set of key examples, one per data type (count, binary, multinomial, ordinal, and time-to-event)

    Longitudinal Joint Modelling of Binary and Continuous Outcomes: A Comparison of Bridge and Normal Distributions

    Get PDF
    Background: Longitudinal joint models consider the variation caused by repeated measurements over time as well as the association among the response variables. In the case of combining binary and continuous response variables using generalized linear mixed models, integrating over a normally distributed random intercept in the binary logistic regression sub-model does not yield to a closed form. In this paper, we assessed the impact of assuming a Bridge distribution for the random intercept in the binary logistic regression submodel and compared the results to that of normal distribution.  Method: The response variables are combined through correlated random intercepts. The random intercept in the continuous outcome submodel follows a normal distribution. The random intercept in the binary outcome submodel follows a normal or Bridge distribution. The estimations were carried out using a likelihood-based approach in direct and conditional joint modeling approaches. To illustrate the performance of the models, a simulation study was conducted Results: Based on the simulation results and regardless of the joint modeling approach, the models with a Bridge distribution for the random intercept of the binary outcome resulted in a slightly more accurate estimations and better performance. Conclusion: In addition to the fact that assuming a bridge distribution for the random intercept in binary logistic regression yields to the same interpretation of parameter estimates in marginal and conditional forms, our study revealed that even if the random intercept of binary logistic regression is normally distributed, assuming a Bridge distribution in the model will result in more accurate results.&nbsp

    Dysbiosis of the faecal microbiota in patients with Crohn's disease and their unaffected relatives

    No full text
    Background and aims A general dysbiosis of the intestinal microbiota has been established in patients with Crohn's disease (CD), but a systematic characterisation of this dysbiosis is lacking. Therefore the composition of the predominant faecal microbiota of patients with CD was studied in comparison with the predominant composition in unaffected controls. Whether dysbiosis is present in relatives of patients CD was also examined. Methods Focusing on families with at least three members affected with CD, faecal samples of 68 patients with CD, 84 of their unaffected relatives and 55 matched controls were subjected to community fingerprinting of the predominant microbiota using denaturing gradient gel electrophoresis (DGGE). To analyse the DGGE profiles, BioNumerics software and non-parametric statistical analyses (SPSS V. 17.0) were used. Observed differences in the predominant microbiota were subsequently confirmed and quantified with real-time PCR. Results Five bacterial species characterised dysbiosis in CD, namely a decrease in Dialister invisus (p = 0.04), an uncharacterised species of Clostridium cluster XIVa (p = 0.03), Faecalibacterium prausnitzii (p<1.3x10(-5)) and Bifidobacterium adolescentis (p = 5.4x10(-6)), and an increase in Ruminococcus gnavus (p = 2.1x10(-7)). Unaffected relatives of patients with CD had less Collinsella aerofaciens (p = 0.004) and a member of the Escherichia coli-Shigella group (p = 0.01) and more Ruminococcus torques (p = 0.02) in their predominant microbiota as compared with healthy subjects. Conclusion Unaffected relatives of patients with CD have a different composition of their microbiota compared with healthy controls. This dysbiosis is not characterised by lack of butyrate producing-bacteria as observed in CD but suggests a role for microorganisms with mucin degradation capacity

    Clusters with random size: maximum likelihood versus weighted estimation

    Get PDF
    Abstract: There are many contemporary designs that do not use a random sample of a fixed, a priori determined size. In case of informative cluster sizes, the cluster size is influenced by the the cluster&apos;s data, but here we cope with some issues that even occur when the cluster size and the data are unrelated. First, fitting models to clusters of varying sizes is often more complicated than when all cluster have the same size. Secondly, in such cases, there usually is no so-called complete sufficient statistic
    • …
    corecore