26,531 research outputs found

    A Cautionary Note on Generalized Linear Models for Covariance of Unbalanced Longitudinal Data

    Get PDF
    Missing data in longitudinal studies can create enormous challenges in data analysis when coupled with the positive-definiteness constraint on a covariance matrix. For complete balanced data, the Cholesky decomposition of a covariance matrix makes it possible to remove the positive-definiteness constraint and use a generalized linear model setup to jointly model the mean and covariance using covariates (Pourahmadi, 2000). However, this approach may not be directly applicable when the longitudinal data are unbalanced, as coherent regression models for the dependence across all times and subjects may not exist. Within the existing generalized linear model framework, we show how to overcome this and other challenges by embedding the covariance matrix of the observed data for each subject in a larger covariance matrix and employing the familiar EM algorithm to compute the maximum likelihood estimates of the parameters and their standard errors. We illustrate and assess the methodology using real data sets and simulations

    The analysis of very small samples of repeated measurements II: a modified box correction

    Get PDF
    There is a need for appropriate methods for the analysis of very small samples of continuous repeated measurements. A key feature of such analyses is the role played by the covariance matrix of the repeated observations. When subjects are few it can be difficult to assess the fit of parsimonious structures for this matrix, while the use of an unstructured form may lead to a serious lack of power. The Kenward-Roger adjustment is now widely adopted as a means of providing an appropriate inferences in small samples, but does not perform adequately in very small samples. Adjusted tests based on the empirical sandwich estimator can be constructed that have good nominal properties, but are seriously underpowered. Further, when such data are incomplete, or unbalanced, or non-saturated mean models are used, exact distributional results do not exist that justify analyses with any sample size. In this paper, a modification of Box's correction applied to a linear model based FF-statistic is developed for such small sample settings and is shown to have both the required nominal properties and acceptable power across a range of settings for repeated measurements

    Formal and Informal Model Selection with Incomplete Data

    Full text link
    Model selection and assessment with incomplete data pose challenges in addition to the ones encountered with complete data. There are two main reasons for this. First, many models describe characteristics of the complete data, in spite of the fact that only an incomplete subset is observed. Direct comparison between model and data is then less than straightforward. Second, many commonly used models are more sensitive to assumptions than in the complete-data situation and some of their properties vanish when they are fitted to incomplete, unbalanced data. These and other issues are brought forward using two key examples, one of a continuous and one of a categorical nature. We argue that model assessment ought to consist of two parts: (i) assessment of a model's fit to the observed data and (ii) assessment of the sensitivity of inferences to unverifiable assumptions, that is, to how a model described the unobserved data given the observed ones.Comment: Published in at http://dx.doi.org/10.1214/07-STS253 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Constrained Optimization for a Subset of the Gaussian Parsimonious Clustering Models

    Full text link
    The expectation-maximization (EM) algorithm is an iterative method for finding maximum likelihood estimates when data are incomplete or are treated as being incomplete. The EM algorithm and its variants are commonly used for parameter estimation in applications of mixture models for clustering and classification. This despite the fact that even the Gaussian mixture model likelihood surface contains many local maxima and is singularity riddled. Previous work has focused on circumventing this problem by constraining the smallest eigenvalue of the component covariance matrices. In this paper, we consider constraining the smallest eigenvalue, the largest eigenvalue, and both the smallest and largest within the family setting. Specifically, a subset of the GPCM family is considered for model-based clustering, where we use a re-parameterized version of the famous eigenvalue decomposition of the component covariance matrices. Our approach is illustrated using various experiments with simulated and real data

    The effects of estimation of censoring, truncation, transformation and partial data vectors

    Get PDF
    The purpose of this research was to attack statistical problems concerning the estimation of distributions for purposes of predicting and measuring assembly performance as it appears in biological and physical situations. Various statistical procedures were proposed to attack problems of this sort, that is, to produce the statistical distributions of the outcomes of biological and physical situations which, employ characteristics measured on constituent parts. The techniques are described

    How do we understand and visualize uncertainty?

    Get PDF
    Geophysicists are often concerned with reconstructing subsurface properties using observations collected at or near the surface. For example, in seismic migration, we attempt to reconstruct subsurface geometry from surface seismic recordings, and in potential field inversion, observations are used to map electrical conductivity or density variations in geologic layers. The procedure of inferring information from indirect observations is called an inverse problem by mathematicians, and such problems are common in many areas of the physical sciences. The inverse problem of inferring the subsurface using surface observations has a corresponding forward problem, which consists of determining the data that would be recorded for a given subsurface configuration. In the seismic case, forward modeling involves a method for calculating a synthetic seismogram, for gravity data it consists of a computer code to compute gravity fields from an assumed subsurface density model. Note that forward modeling often involves assumptions about the appropriate physical relationship between unknowns (at depth) and observations on the surface, and all attempts to solve the problem at hand are limited by the accuracy of those assumptions. In the broadest sense then, exploration geophysicists have been engaged in inversion since the dawn of the profession and indeed algorithms often applied in processing centers can all be viewed as procedures to invert geophysical data
    corecore