414 research outputs found

    Formal and Informal Model Selection with Incomplete Data

    Full text link
    Model selection and assessment with incomplete data pose challenges in addition to the ones encountered with complete data. There are two main reasons for this. First, many models describe characteristics of the complete data, in spite of the fact that only an incomplete subset is observed. Direct comparison between model and data is then less than straightforward. Second, many commonly used models are more sensitive to assumptions than in the complete-data situation and some of their properties vanish when they are fitted to incomplete, unbalanced data. These and other issues are brought forward using two key examples, one of a continuous and one of a categorical nature. We argue that model assessment ought to consist of two parts: (i) assessment of a model's fit to the observed data and (ii) assessment of the sensitivity of inferences to unverifiable assumptions, that is, to how a model described the unobserved data given the observed ones.Comment: Published in at http://dx.doi.org/10.1214/07-STS253 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Discussion of Likelihood Inference for Models with Unobservables: Another View

    Full text link
    Discussion of "Likelihood Inference for Models with Unobservables: Another View" by Youngjo Lee and John A. Nelder [arXiv:1010.0303]Comment: Published in at http://dx.doi.org/10.1214/09-STS277A the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A goodness-of-fit test for the random-effects distribution in mixed models

    Get PDF
    In this paper, we develop a simple diagnostic test for the random-effects distribution in mixed models. The test is based on the gradient function, a graphical tool proposed by Verbeke and Molenberghs to check the impact of assumptions about the random-effects distribution in mixed models on inferences. Inference is conducted through the bootstrap. The proposed test is easy to implement and applicable in a general class of mixed models. The operating characteristics of the test are evaluated in a simulation study, and the method is further illustrated using two real data analyses

    A combined beta and normal random-effects model for repeated, overdispersed binary and binomial data

    Get PDF
    AbstractNon-Gaussian outcomes are often modeled using members of the so-called exponential family. Notorious members are the Bernoulli model for binary data, leading to logistic regression, and the Poisson model for count data, leading to Poisson regression. Two of the main reasons for extending this family are (1) the occurrence of overdispersion, meaning that the variability in the data is not adequately described by the models, which often exhibit a prescribed mean-variance link, and (2) the accommodation of hierarchical structure in the data, stemming from clustering in the data which, in turn, may result from repeatedly measuring the outcome, for various members of the same family, etc. The first issue is dealt with through a variety of overdispersion models, such as, for example, the beta-binomial model for grouped binary data and the negative-binomial model for counts. Clustering is often accommodated through the inclusion of random subject-specific effects. Though not always, one conventionally assumes such random effects to be normally distributed. While both of these phenomena may occur simultaneously, models combining them are uncommon. This paper starts from the broad class of generalized linear models accommodating overdispersion and clustering through two separate sets of random effects. We place particular emphasis on so-called conjugate random effects at the level of the mean for the first aspect and normal random effects embedded within the linear predictor for the second aspect, even though our family is more general. The binary and binomial cases are our focus. Apart from model formulation, we present an overview of estimation methods, and then settle for maximum likelihood estimation with analytic-numerical integration. The methodology is applied to two datasets of which the outcomes are binary and binomial, respectively

    beadarrayFilter : an R package to filter beads

    Get PDF
    Microarrays enable the expression levels of thousands of genes to be measured simultaneously. However, only a small fraction of these genes are expected to be expressed under different experimental conditions. Nowadays, filtering has been introduced as a step in the microarray preprocessing pipeline. Gene filtering aims at reducing the dimensionality of data by filtering redundant features prior to the actual statistical analysis. Previous filtering methods focus on the Affymetrix platform and can not be easily ported to the Illumina platform. As such, we developed a filtering method for Illumina bead arrays. We developed an R package, beadarrayFilter, to implement the latter method. In this paper, the main functions in the package are highlighted and using many examples, we illustrate how beadarrayFilter can be used to filter bead arrays

    Hierarchical models with normal and conjugate random effects : a review

    Get PDF
    Molenberghs, Verbeke, and Demétrio (2007) and Molenberghs et al. (2010) proposed a general framework to model hierarchical data subject to within-unit correlation and/or overdispersion. The framework extends classical overdispersion models as well as generalized linear mixed models. Subsequent work has examined various aspects that lead to the formulation of several extensions. A unified treatment of the model framework and key extensions is provided. Particular extensions discussed are: explicit calculation of correlation and other moment-based functions, joint modelling of several hierarchical sequences, versions with direct marginally interpretable parameters, zero-inflation in the count case, and influence diagnostics. The basic models and several extensions are illustrated using a set of key examples, one per data type (count, binary, multinomial, ordinal, and time-to-event)

    Longitudinal Joint Modelling of Binary and Continuous Outcomes: A Comparison of Bridge and Normal Distributions

    Get PDF
    Background: Longitudinal joint models consider the variation caused by repeated measurements over time as well as the association among the response variables. In the case of combining binary and continuous response variables using generalized linear mixed models, integrating over a normally distributed random intercept in the binary logistic regression sub-model does not yield to a closed form. In this paper, we assessed the impact of assuming a Bridge distribution for the random intercept in the binary logistic regression submodel and compared the results to that of normal distribution.  Method: The response variables are combined through correlated random intercepts. The random intercept in the continuous outcome submodel follows a normal distribution. The random intercept in the binary outcome submodel follows a normal or Bridge distribution. The estimations were carried out using a likelihood-based approach in direct and conditional joint modeling approaches. To illustrate the performance of the models, a simulation study was conducted Results: Based on the simulation results and regardless of the joint modeling approach, the models with a Bridge distribution for the random intercept of the binary outcome resulted in a slightly more accurate estimations and better performance. Conclusion: In addition to the fact that assuming a bridge distribution for the random intercept in binary logistic regression yields to the same interpretation of parameter estimates in marginal and conditional forms, our study revealed that even if the random intercept of binary logistic regression is normally distributed, assuming a Bridge distribution in the model will result in more accurate results.&nbsp

    Dysbiosis of the faecal microbiota in patients with Crohn's disease and their unaffected relatives

    No full text
    Background and aims A general dysbiosis of the intestinal microbiota has been established in patients with Crohn's disease (CD), but a systematic characterisation of this dysbiosis is lacking. Therefore the composition of the predominant faecal microbiota of patients with CD was studied in comparison with the predominant composition in unaffected controls. Whether dysbiosis is present in relatives of patients CD was also examined. Methods Focusing on families with at least three members affected with CD, faecal samples of 68 patients with CD, 84 of their unaffected relatives and 55 matched controls were subjected to community fingerprinting of the predominant microbiota using denaturing gradient gel electrophoresis (DGGE). To analyse the DGGE profiles, BioNumerics software and non-parametric statistical analyses (SPSS V. 17.0) were used. Observed differences in the predominant microbiota were subsequently confirmed and quantified with real-time PCR. Results Five bacterial species characterised dysbiosis in CD, namely a decrease in Dialister invisus (p = 0.04), an uncharacterised species of Clostridium cluster XIVa (p = 0.03), Faecalibacterium prausnitzii (p<1.3x10(-5)) and Bifidobacterium adolescentis (p = 5.4x10(-6)), and an increase in Ruminococcus gnavus (p = 2.1x10(-7)). Unaffected relatives of patients with CD had less Collinsella aerofaciens (p = 0.004) and a member of the Escherichia coli-Shigella group (p = 0.01) and more Ruminococcus torques (p = 0.02) in their predominant microbiota as compared with healthy subjects. Conclusion Unaffected relatives of patients with CD have a different composition of their microbiota compared with healthy controls. This dysbiosis is not characterised by lack of butyrate producing-bacteria as observed in CD but suggests a role for microorganisms with mucin degradation capacity

    Incidence, prevalence, and co-occurrence of autoimmune disorders over time and by age, sex, and socioeconomic status: a population-based cohort study of 22 million individuals in the UK

    Get PDF
    Background: A rise in the incidence of some autoimmune disorders has been described. However, contemporary estimates of the overall incidence of autoimmune diseases and trends over time are scarce and inconsistent. We aimed to investigate the incidence and prevalence of 19 of the most common autoimmune diseases in the UK, assess trends over time, and by sex, age, socioeconomic status, season, and region, and we examine rates of co-occurrence among autoimmune diseases. Methods: In this UK population-based study, we used linked primary and secondary electronic health records from the Clinical Practice Research Datalink (CPRD), a cohort that is representative of the UK population in terms of age and sex and ethnicity. Eligible participants were men and women (no age restriction) with acceptable records, approved for Hospital Episodes Statistics and Office of National Statistics linkage, and registered with their general practice for at least 12 months during the study period. We calculated age and sex standardised incidence and prevalence of 19 autoimmune disorders from 2000 to 2019 and used negative binomial regression models to investigate temporal trends and variation by age, sex, socioeconomic status, season of onset, and geographical region in England. To characterise co-occurrence of autoimmune diseases, we calculated incidence rate ratios (IRRs), comparing incidence rates of comorbid autoimmune disease among individuals with a first (index) autoimmune disease with incidence rates in the general population, using negative binomial regression models, adjusted for age and sex. Findings: Among the 22 009 375 individuals included in the study, 978 872 had a new diagnosis of at least one autoimmune disease between Jan 1, 2000, and June 30, 2019 (mean age 54·0 years [SD 21·4]). 625 879 (63·9%) of these diagnosed individuals were female and 352 993 (36·1%) were male. Over the study period, age and sex standardised incidence rates of any autoimmune diseases increased (IRR 2017–19 vs 2000–02 1·04 [95% CI 1·00–1·09]). The largest increases were seen in coeliac disease (2·19 [2·05–2·35]), Sjogren's syndrome (2·09 [1·84–2·37]), and Graves' disease (2·07 [1·92–2·22]); pernicious anaemia (0·79 [0·72–0·86]) and Hashimoto's thyroiditis (0·81 [0·75–0·86]) significantly decreased in incidence. Together, the 19 autoimmune disorders examined affected 10·2% of the population over the study period (1 912 200 [13·1%] women and 668 264 [7·4%] men). A socioeconomic gradient was evident across several diseases, including pernicious anaemia (most vs least deprived area IRR 1·72 [1·64–1·81]), rheumatoid arthritis (1·52 [1·45–1·59]), Graves' disease (1·36 [1·30–1·43]), and systemic lupus erythematosus (1·35 [1·25–1·46]). Seasonal variations were observed for childhood-onset type 1 diabetes (more commonly diagnosed in winter) and vitiligo (more commonly diagnosed in summer), and regional variations were observed for a range of conditions. Autoimmune disorders were commonly associated with each other, particularly Sjögren's syndrome, systemic lupus erythematosus, and systemic sclerosis. Individuals with childhood-onset type 1 diabetes also had significantly higher rates of Addison's disease (IRR 26·5 [95% CI 17·3–40·7]), coeliac disease (28·4 [25·2–32·0]), and thyroid disease (Hashimoto's thyroiditis 13·3 [11·8–14·9] and Graves' disease 6·7 [5·1–8·5]), and multiple sclerosis had a particularly low rate of co-occurrence with other autoimmune diseases. Interpretation: Autoimmune diseases affect approximately one in ten individuals, and their burden continues to increase over time at varying rates across individual diseases. The socioeconomic, seasonal, and regional disparities observed among several autoimmune disorders in our study suggest environmental factors in disease pathogenesis. The inter-relations between autoimmune diseases are commensurate with shared pathogenetic mechanisms or predisposing factors, particularly among connective tissue diseases and among endocrine diseases
    • …
    corecore