45 research outputs found

    Generalised Linear Mixed Model Specification, Analysis, Fitting, and Optimal Design in R with the glmmr Packages

    Full text link
    We describe the \proglang{R} package \pkg{glmmrBase} and an extension \pkg{glmmrOptim}. \pkg{glmmrBase} provides a flexible approach to specifying, fitting, and analysing generalised linear mixed models. We use an object-orientated class system within \proglang{R} to provide methods for a wide range of covariance and mean functions, including specification of non-linear functions of data and parameters, relevant to multiple applications including cluster randomised trials, cohort studies, spatial and spatio-temporal modelling, and split-plot designs. The class generates relevant matrices and statistics and a wide range of methods including full likelihood estimation of generalised linear mixed models using Markov Chain Monte Carlo Maximum Likelihood, Laplace approximation, power calculation, and access to relevant calculations. The class also includes Hamiltonian Monte Carlo simulation of random effects, sparse matrix methods, and other functionality to support efficient estimation. The \pkg{glmmrOptim} package implements a set of algorithms to identify c-optimal experimental designs where observations are correlated and can be specified using the generalised linear mixed model classes. Several examples and comparisons to existing packages are provided to illustrate use of the packages

    Decision Tree and Random Forest Methodology for Clustered and Longitudinal Binary Outcomes

    Get PDF
    Clustered binary outcomes are frequently encountered in medical research (e.g. longitudinal studies). Generalized linear mixed models (GLMMs) typically employed for clustered endpoints have challenges for some scenarios (e.g. high dimensional data). In the first dissertation aim, we develop an alternative, data-driven method called Binary Mixed Model (BiMM) tree, which combines decision tree and GLMM. We propose a procedure akin to the expectation maximization algorithm, which iterates between developing a classification and regression tree using all predictors and developing a GLMM which includes indicator variables for terminal nodes from the tree as predictors along with a random effect for the clustering variable. Since prediction accuracy may be increased through ensemble methods, we extend BiMM tree methodology within the random forest setting in the second dissertation aim. BiMM forest combines random forest and GLMM within a unified framework using an algorithmic procedure which iterates between developing a random forest and using the predicted probabilities of observations from the random forest within a GLMM that contains a random effect for the clustering variable. Simulation studies show that BiMM tree and BiMM forest methodology offer similar or superior prediction accuracy compared to standard classification and regression tree, random forest and GLMM for clustered binary outcomes. The new BiMM methods are used to develop prediction models within the acute liver failure setting using the first seven days of hospital data for the third dissertation aim. Acute liver failure is a rare and devastating condition characterized by rapid onset of severe liver damage. The majority of prediction models developed for acute liver failure patients use admission data only, even though many clinical and laboratory variables are collected daily. The novel BiMM tree and forest methodology developed in this dissertation can be used in diverse research settings to provide highly accurate and efficient prediction models for clustered and longitudinal binary outcomes

    UNCERTAINTY QUANTIFICATION IN ENGINEERING OPTIMIZATION APPLICATIONS

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Descriptive discriminant analysis for repeated measures data

    Get PDF
    Background: Linear discriminant analysis (DA) encompasses procedures for classifying observations into groups (predictive discriminant analysis, PDA) and describing the relative importance of variables for distinguishing between groups (descriptive discriminant analysis, DDA) in multivariate data. In recent years, there has been increased interest in DA procedures for repeated measures data. PDA procedures that assume parsimonious repeated measures mean and covariance structures have been developed, but corresponding DDA procedures have not been proposed. Most DA procedures for repeated measures data rest on the assumption of multivariate normality, which may not be satisfied in biostatistical applications. For example, health-related quality of life (HRQOL) measures, which are increasingly being used as outcomes in clinical trials and cohort studies, are likely to exhibit skewed or heavy-tailed distributions. As well, measures of relative importance based on discriminant function coefficients (DFCs) for DDA procedures have not been proposed for repeated measures data. Purpose: The purpose of this research is to develop repeated measures discriminant analysis (RMDA) procedures based on parsimonious covariance structures, including compound symmetric and first order autoregressive structures, and that are robust (i.e., insensitive) to multivariate non-normal distributions. It also extends these methods to evaluate the relative importance of variables in multivariate repeated measures (i.e., doubly multivariate) data. Method: Monte Carlo studies were conducted to investigate the performance of the proposed RMDA procedures under various degrees of group mean separation, repeated measures correlation structures, departure from multivariate normality, and magnitude of covariance mis-specification. Data from the Manitoba Inflammatory Bowel Disease Cohort Study, a prospective longitudinal cohort study about the psychosocial determinants of health and well-being, are used to illustrate their applications. Results: The conventional maximum likelihood (ML) estimates of DFCs for RMDA procedures based on parsimonious covariance structures exhibited substantial bias and error when the covariance structure was mis-specified or when the data followed a multivariate skewed or heavy-tailed distribution. The DFCs of RMDA procedures based on robust estimators obtained from coordinatewise trimmed means and Winsorized variances, were less biased and more efficient when the data followed a multivariate non-normal distribution, but were sensitive to the effects of covariance mis-specification. Measures of relative importance for doubly multivariate data based on linear combinations of the within-variable DFCs resulted in the highest proportion of correctly ranked variables. Conclusions: DA procedures based on parsimonious covariance structures and robust estimators will produce unbiased and efficient estimates of variable relative importance of variables in repeated measures data and can be used to test for change in relative importance over time. The choice among these RMDA procedures should be guided by preliminary descriptive assessments of the data

    Proceedings of the 36th International Workshop Statistical Modelling July 18-22, 2022 - Trieste, Italy

    Get PDF
    The 36th International Workshop on Statistical Modelling (IWSM) is the first one held in presence after a two year hiatus due to the COVID-19 pandemic. This edition was quite lively, with 60 oral presentations and 53 posters, covering a vast variety of topics. As usual, the extended abstracts of the papers are collected in the IWSM proceedings, but unlike the previous workshops, this year the proceedings will be not printed on paper, but it is only online. The workshop proudly maintains its almost unique feature of scheduling one plenary session for the whole week. This choice has always contributed to the stimulating atmosphere of the conference, combined with its informal character, encouraging the exchange of ideas and cross-fertilization among different areas as a distinguished tradition of the workshop, student participation has been strongly encouraged. This IWSM edition is particularly successful in this respect, as testified by the large number of students included in the program

    Species divergence and maintenance of species cohesion of three closely related Primula species in the Qinghai-Tibet Plateau

    Get PDF
    Understanding the relative roles of geography and ecology in driving speciation, population divergence and maintenance of species cohesion is of great interest to molecular ecology. Closely related species that are parapatricly distributed in mountainous areas provide an ideal model to evaluate these key issues, especially when genomic data are analyzed within a spatially and ecologically explicit context. Here we used three closely related species of Primula that occur in the Himalayas, the Hengduan Mountains and Northeast Qinghai-Tibet Plateau (QTP) to examine the effects of geography and ecology on interspecific divergence and maintenance of species cohesion. We used genomic data for 770 samples of the three species using restriction site-associated DNA (RAD) sequencing and combined approximate Bayesian computation (ABC) modeling, Bayesian generalized linear mixed modeling (GLMM) and species distribution modeling (SDM). The three species are clearly delimited by the RADseq data. Further ABC modeling indicates that the three species originated in the Himalayas and diverged from each other following the uplifts of the Hengduan Mountains and the Northern QTP during the Pliocene. After a long period of divergence, the three species came into secondary contact triggered by past climatic changes but with no significant introgression. The three species display complex and different drivers of genomic variation, which provides further insights into the effects of geographical and ecological factors on maintaining species cohesion. Our findings highlight the significance of combining the use of population genomics with environmental data when evaluating the effects of geography and ecology on interspecific divergence and maintenance of closely related specie
    corecore