15,728 research outputs found

    A longitudinal study of student performance in English using repeated measures, multilevel and logistic regression models

    Get PDF
    This paper presents three statistical models that analyze longitudinal data on student performance in English. A random sample, comprising male and female students who attend either a state or a private school, was selected to investigate gender and school bias in this subject. The English annual marks attained by each student were recorded during the last three years in primary schools. In the first approach, we present a repeated measures analysis of variance that captures the correlation between the repeated measures. Several tests are carried out to check for within subjects and between subjects effects; equality of covariance matrices and sphericity. In the second approach, we fit a two-level random coefficient model to examine the effect of time on student performance in English. This model allows the student-specific coefficients describing individual trajectories to vary randomly. In the third approach, we fit a Logistic regression model to estimate the probability of passing the Eleven-Plus examination that students sit for when they terminate Primary education.peer-reviewe

    A generalized linear mixed model for longitudinal binary data with a marginal logit link function

    Get PDF
    Longitudinal studies of a binary outcome are common in the health, social, and behavioral sciences. In general, a feature of random effects logistic regression models for longitudinal binary data is that the marginal functional form, when integrated over the distribution of the random effects, is no longer of logistic form. Recently, Wang and Louis [Biometrika 90 (2003) 765--775] proposed a random intercept model in the clustered binary data setting where the marginal model has a logistic form. An acknowledged limitation of their model is that it allows only a single random effect that varies from cluster to cluster. In this paper we propose a modification of their model to handle longitudinal data, allowing separate, but correlated, random intercepts at each measurement occasion. The proposed model allows for a flexible correlation structure among the random intercepts, where the correlations can be interpreted in terms of Kendall's Ļ„\tau. For example, the marginal correlations among the repeated binary outcomes can decline with increasing time separation, while the model retains the property of having matching conditional and marginal logit link functions. Finally, the proposed method is used to analyze data from a longitudinal study designed to monitor cardiac abnormalities in children born to HIV-infected women.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS390 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field

    Nonlinear quantile mixed models

    Full text link
    In regression applications, the presence of nonlinearity and correlation among observations offer computational challenges not only in traditional settings such as least squares regression, but also (and especially) when the objective function is non-smooth as in the case of quantile regression. In this paper, we develop methods for the modeling and estimation of nonlinear conditional quantile functions when data are clustered within two-level nested designs. This work represents an extension of the linear quantile mixed models of Geraci and Bottai (2014, Statistics and Computing). We develop a novel algorithm which is a blend of a smoothing algorithm for quantile regression and a second order Laplacian approximation for nonlinear mixed models. To assess the proposed methods, we present a simulation study and two applications, one in pharmacokinetics and one related to growth curve modeling in agriculture.Comment: 26 pages, 8 figures, 8 table

    General Design Bayesian Generalized Linear Mixed Models

    Get PDF
    Linear mixed models are able to handle an extraordinary range of complications in regression-type analyses. Their most common use is to account for within-subject correlation in longitudinal data analysis. They are also the standard vehicle for smoothing spatial count data. However, when treated in full generality, mixed models can also handle spline-type smoothing and closely approximate kriging. This allows for nonparametric regression models (e.g., additive models and varying coefficient models) to be handled within the mixed model framework. The key is to allow the random effects design matrix to have general structure; hence our label general design. For continuous response data, particularly when Gaussianity of the response is reasonably assumed, computation is now quite mature and supported by the R, SAS and S-PLUS packages. Such is not the case for binary and count responses, where generalized linear mixed models (GLMMs) are required, but are hindered by the presence of intractable multivariate integrals. Software known to us supports special cases of the GLMM (e.g., PROC NLMIXED in SAS or glmmML in R) or relies on the sometimes crude Laplace-type approximation of integrals (e.g., the SAS macro glimmix or glmmPQL in R). This paper describes the fitting of general design generalized linear mixed models. A Bayesian approach is taken and Markov chain Monte Carlo (MCMC) is used for estimation and inference. In this generalized setting, MCMC requires sampling from nonstandard distributions. In this article, we demonstrate that the MCMC package WinBUGS facilitates sound fitting of general design Bayesian generalized linear mixed models in practice.Comment: Published at http://dx.doi.org/10.1214/088342306000000015 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A semiparametric regression model for paired longitudinal outcomes with application in childhood blood pressure development

    Full text link
    This research examines the simultaneous influences of height and weight on longitudinally measured systolic and diastolic blood pressure in children. Previous studies have shown that both height and weight are positively associated with blood pressure. In children, however, the concurrent increases of height and weight have made it all but impossible to discern the effect of height from that of weight. To better understand these influences, we propose to examine the joint effect of height and weight on blood pressure. Bivariate thin plate spline surfaces are used to accommodate the potentially nonlinear effects as well as the interaction between height and weight. Moreover, we consider a joint model for paired blood pressure measures, that is, systolic and diastolic blood pressure, to account for the underlying correlation between the two measures within the same individual. The bivariate spline surfaces are allowed to vary across different groups of interest. We have developed related model fitting and inference procedures. The proposed method is used to analyze data from a real clinical investigation.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS567 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Linear mixed models with endogenous covariates: modeling sequential treatment effects with application to a mobile health study

    Full text link
    Mobile health is a rapidly developing field in which behavioral treatments are delivered to individuals via wearables or smartphones to facilitate health-related behavior change. Micro-randomized trials (MRT) are an experimental design for developing mobile health interventions. In an MRT the treatments are randomized numerous times for each individual over course of the trial. Along with assessing treatment effects, behavioral scientists aim to understand between-person heterogeneity in the treatment effect. A natural approach is the familiar linear mixed model. However, directly applying linear mixed models is problematic because potential moderators of the treatment effect are frequently endogenous---that is, may depend on prior treatment. We discuss model interpretation and biases that arise in the absence of additional assumptions when endogenous covariates are included in a linear mixed model. In particular, when there are endogenous covariates, the coefficients no longer have the customary marginal interpretation. However, these coefficients still have a conditional-on-the-random-effect interpretation. We provide an additional assumption that, if true, allows scientists to use standard software to fit linear mixed model with endogenous covariates, and person-specific predictions of effects can be provided. As an illustration, we assess the effect of activity suggestion in the HeartSteps MRT and analyze the between-person treatment effect heterogeneity

    Variable Selection for Generalized Linear Mixed Models by L1-Penalized Estimation

    Get PDF
    Generalized linear mixed models are a widely used tool for modeling longitudinal data. However, their use is typically restricted to few covariates, because the presence of many predictors yields unstable estimates. The presented approach to the fitting of generalized linear mixed models includes an L1-penalty term that enforces variable selection and shrinkage simultaneously. A gradient ascent algorithm is proposed that allows to maximize the penalized loglikelihood yielding models with reduced complexity. In contrast to common procedures it can be used in high-dimensional settings where a large number of otentially influential explanatory variables is available. The method is investigated in simulation studies and illustrated by use of real data sets

    Variable Selection for Generalized Linear Mixed Models by L1-Penalized Estimation

    Get PDF
    Generalized linear mixed models are a widely used tool for modeling longitudinal data. However, their use is typically restricted to few covariates, because the presence of many predictors yields unstable estimates. The presented approach to the fitting of generalized linear mixed models includes an L1-penalty term that enforces variable selection and shrinkage simultaneously. A gradient ascent algorithm is proposed that allows to maximize the penalized loglikelihood yielding models with reduced complexity. In contrast to common procedures it can be used in high-dimensional settings where a large number of otentially influential explanatory variables is available. The method is investigated in simulation studies and illustrated by use of real data sets
    • ā€¦
    corecore