50,996 research outputs found

    Analysis of Models for Longitudinal and Clustered Binary Data

    Get PDF
    This dissertation deals with modeling and statistical analysis of longitudinal and clustered binary data. Such data consists of observations on a dichotomous response variable generated from multiple time or cluster points, that exhibit either decaying correlation or equi-correlated dependence. The current literature addresses modeling the dependence using an appropriate correlation structure, but ignores the feasible bounds on the correlation parameter imposed by the marginal means. The first part of this dissertation deals with two multivariate probability models, the first order Markov chain model and the multivariate probit model, that adhere to the feasible bounds on the correlation. For both the models we obtain maximum likelihood estimates for the regression and correlation parameters, and study both asymptotic and small-sample properties of the estimates. Through simulations we compare the efficiency of the two methods and demonstrate that neither is uniformly superior over the other. The second part of this dissertation deals with marginal models, an alternative to multivariate probability models. We discuss the generalized estimating equations and the quadratic inference function methods for estimating the regression parameter in marginal models. Relative efficiency calculations show these methods when compared to the likelihood estimates could result in significant loss in efficiency for highly correlated data. We also propose a modified quadratic inference function method and demonstrate through efficiency calculations this is an improvement of the original quadratic inference function approach. The final part of this dissertation deals with methods for constructing higher order Markov chain models using copulas

    Exploratory multivariate longitudinal data analysis and models for multivariate longitudinal binary data

    Get PDF
    Longitudinal data occurs when repeated measurements from the same subject are observed over time. In this thesis, exploratory data analysis and models are utilized jointly to analyze longitudinal data which leads to stronger and better justified conclusions. The complex structure of longitudinal data with covariates requires new visual methods that enable interactive exploration. Here we catalog the general principles of exploratory data analysis for multivariate longitudinal data, and illustrate the use of the linked brushing approach for studying the mean structure over time. It is possible to reveal the unexpected, to explore the interaction between responses and covariates, to observe the individual variations, understand structure in multiple dimensions, and diagnose and fix models by using these methods. We also propose models for multivariate longitudinal binary data that directly model marginal covariate effects while accounting for the dependence across time via a transition structure and across responses within a subject for a given time via random effects. Markov Chain Monte Carlo Methods, specifically Gibbs sampling with Hybrid steps, are used to sample from the posterior distribution of parameters. Graphical and quantitative checks are used to assess model fit. The methods are illustrated on several real datasets, primarily the Iowa Youth and Families Project.*;*This dissertation is a compound document (contains both a paper copy and a CD as part of the dissertation)

    Statistical Analysis of Longitudinal and Multivariate Discrete Data

    Get PDF
    Correlated multivariate Poisson and binary variables occur naturally in medical, biological and epidemiological longitudinal studies. Modeling and simulating such variables is difficult because the correlations are restricted by the marginal means via Fréchet bounds in a complicated way. In this dissertation we will first discuss partially specified models and methods for estimating the regression and correlation parameters. We derive the asymptotic distributions of these parameter estimates. Using simulations based on extensions of the algorithm due to Sim (1993, Journal of Statistical Computation and Simulation, 47, pp. 1–10), we study the performance of these estimates using infeasibility, coverage probabilities of the confidence ellipsoids, and asymptotic relative efficiencies as the criteria. The second part of this dissertation is devoted to the study of fully specified models constructed using copulas, with special emphasis on the normal copula. Finding the maximum likelihood estimates and the Fisher information matrix for these models requires computation of multivariate normal probabilities. We also discuss several efficient algorithms for calculating multivariate normal integrals. For the multivariate probit and multivariate Poisson log-normal models, we implement maximum likelihood, derive the necessary equations, and illustrate it on two real life data sets. Next we study over and under dispersed models including quasi-multinomial and Lagrange families of distributions. We implement the maximum likelihood method for the quasi-multinomial model and illustrate the application of this model for market analysis of household preferences for saltine crackers

    Estimating correlation between multivariate longitudinal data in the presence of heterogeneity

    Get PDF
    Abstract Background Estimating correlation coefficients among outcomes is one of the most important analytical tasks in epidemiological and clinical research. Availability of multivariate longitudinal data presents a unique opportunity to assess joint evolution of outcomes over time. Bivariate linear mixed model (BLMM) provides a versatile tool with regard to assessing correlation. However, BLMMs often assume that all individuals are drawn from a single homogenous population where the individual trajectories are distributed smoothly around population average. Methods Using longitudinal mean deviation (MD) and visual acuity (VA) from the Ocular Hypertension Treatment Study (OHTS), we demonstrated strategies to better understand the correlation between multivariate longitudinal data in the presence of potential heterogeneity. Conditional correlation (i.e., marginal correlation given random effects) was calculated to describe how the association between longitudinal outcomes evolved over time within specific subpopulation. The impact of heterogeneity on correlation was also assessed by simulated data. Results There was a significant positive correlation in both random intercepts (ρ = 0.278, 95% CI: 0.121–0.420) and random slopes (ρ = 0.579, 95% CI: 0.349–0.810) between longitudinal MD and VA, and the strength of correlation constantly increased over time. However, conditional correlation and simulation studies revealed that the correlation was induced primarily by participants with rapid deteriorating MD who only accounted for a small fraction of total samples. Conclusion Conditional correlation given random effects provides a robust estimate to describe the correlation between multivariate longitudinal data in the presence of unobserved heterogeneity (NCT00000125)

    Quantile regression for mixed models with an application to examine blood pressure trends in China

    Get PDF
    Cardiometabolic diseases have substantially increased in China in the past 20 years and blood pressure is a primary modifiable risk factor. Using data from the China Health and Nutrition Survey, we examine blood pressure trends in China from 1991 to 2009, with a concentration on age cohorts and urbanicity. Very large values of blood pressure are of interest, so we model the conditional quantile functions of systolic and diastolic blood pressure. This allows the covariate effects in the middle of the distribution to vary from those in the upper tail, the focal point of our analysis. We join the distributions of systolic and diastolic blood pressure using a copula, which permits the relationships between the covariates and the two responses to share information and enables probabilistic statements about systolic and diastolic blood pressure jointly. Our copula maintains the marginal distributions of the group quantile effects while accounting for within-subject dependence, enabling inference at the population and subject levels. Our population-level regression effects change across quantile level, year and blood pressure type, providing a rich environment for inference. To our knowledge, this is the first quantile function model to explicitly model within-subject autocorrelation and is the first quantile function approach that simultaneously models multivariate conditional response. We find that the association between high blood pressure and living in an urban area has evolved from positive to negative, with the strongest changes occurring in the upper tail. The increase in urbanization over the last twenty years coupled with the transition from the positive association between urbanization and blood pressure in earlier years to a more uniform association with urbanization suggests increasing blood pressure over time throughout China, even in less urbanized areas. Our methods are available in the R package BSquare.Comment: Published at http://dx.doi.org/10.1214/15-AOAS841 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Estimation of extended mixed models using latent classes and latent processes: the R package lcmm

    Get PDF
    The R package lcmm provides a series of functions to estimate statistical models based on linear mixed model theory. It includes the estimation of mixed models and latent class mixed models for Gaussian longitudinal outcomes (hlme), curvilinear and ordinal univariate longitudinal outcomes (lcmm) and curvilinear multivariate outcomes (multlcmm), as well as joint latent class mixed models (Jointlcmm) for a (Gaussian or curvilinear) longitudinal outcome and a time-to-event that can be possibly left-truncated right-censored and defined in a competing setting. Maximum likelihood esimators are obtained using a modified Marquardt algorithm with strict convergence criteria based on the parameters and likelihood stability, and on the negativity of the second derivatives. The package also provides various post-fit functions including goodness-of-fit analyses, classification, plots, predicted trajectories, individual dynamic prediction of the event and predictive accuracy assessment. This paper constitutes a companion paper to the package by introducing each family of models, the estimation technique, some implementation details and giving examples through a dataset on cognitive aging
    corecore