376 research outputs found

    Bayesian structured antedependence model proposals for longitudinal data

    Get PDF
    An important problem in Statistics is the study of longitudinal data taking into account the effect of other explanatory variables, such as treatments and time and, simultaneously, the incorporation into the model of the time dependence between observations on the same individual. The latter is specially relevant in the case of nonstationary correlations, and nonconstant variances for the different time point at which measurements are taken. Antedependence models constitute a well known commonly used set of models that can accommodate this behaviour. These covariance models can include too many parameters and estimation can be a complicated optimization problem requiring the use of complex algorithms and programming. In this paper, a new Bayesian approach to analyse longitudinal data within the context of antedependence models is proposed. This innovative approach takes into account the possibility of having nonstationary correlations and variances, and proposes a robust and computationally efficient estimation method for this type of data. We consider the joint modelling of the mean and covariance structures for the general antedependence model, estimating their parameters in a longitudinal data context. Our Bayesian approach is based on a generalization of the Gibbs sampling and Metropolis-Hastings by blocks algorithm, properly adapted to the antedependence models longitudinal data settings. Finally, we illustrate the proposed methodology by analysing several examples where antedependence models have been shown to be useful: the small mice, the speech recognition and the race data sets

    Covariance Estimation: The GLM and Regularization Perspectives

    Get PDF
    Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent high-dimensional data environment where enforcing the positive-definiteness constraint could be computationally expensive. We provide a survey of the progress made in modeling covariance matrices from two relatively complementary perspectives: (1) generalized linear models (GLM) or parsimony and use of covariates in low dimensions, and (2) regularization or sparsity for high-dimensional data. An emerging, unifying and powerful trend in both perspectives is that of reducing a covariance estimation problem to that of estimating a sequence of regression problems. We point out several instances of the regression-based formulation. A notable case is in sparse estimation of a precision matrix or a Gaussian graphical model leading to the fast graphical LASSO algorithm. Some advantages and limitations of the regression-based Cholesky decomposition relative to the classical spectral (eigenvalue) and variance-correlation decompositions are highlighted. The former provides an unconstrained and statistically interpretable reparameterization, and guarantees the positive-definiteness of the estimated covariance matrix. It reduces the unintuitive task of covariance estimation to that of modeling a sequence of regressions at the cost of imposing an a priori order among the variables. Elementwise regularization of the sample covariance matrix such as banding, tapering and thresholding has desirable asymptotic properties and the sparse estimated covariance matrix is positive definite with probability tending to one for large samples and dimensions.Comment: Published in at http://dx.doi.org/10.1214/11-STS358 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Functional Clustering of Periodic Transcriptional Profiles through ARMA(p,q)

    Get PDF
    Background: Gene clustering of periodic transcriptional profiles provides an opportunity to shed light on a variety of biological processes, but this technique relies critically upon the robust modeling of longitudinal covariance structure over time. Methodology: We propose a statistical method for functional clustering of periodic gene expression by modeling the covariance matrix of serial measurements through a general autoregressive moving-average process of order (p,q), the socalled ARMA(p,q). We derive a sophisticated EM algorithm to estimate the proportions of each gene cluster, the Fourier series parameters that define gene-specific differences in periodic expression trajectories, and the ARMA parameters that model the covariance structure within a mixture model framework. The orders p and q of the ARMA process that provide the best fit are identified by model selection criteria. Conclusions: Through simulated data we show that whenever it is necessary, employment of sophisticated covariance structures such as ARMA is crucial in order to obtain unbiased estimates of the mean structure parameters and increased precision of estimation. The methods were implemented on recently published time-course gene expression data in yeast and the procedure was shown to effectively identify interesting periodic clusters in the dataset. The new approach wil

    Modeling covariance matrices via partial autocorrelations

    Get PDF
    AbstractWe study the role of partial autocorrelations in the reparameterization and parsimonious modeling of a covariance matrix. The work is motivated by and tries to mimic the phenomenal success of the partial autocorrelations function (PACF) in model formulation, removing the positive-definiteness constraint on the autocorrelation function of a stationary time series and in reparameterizing the stationarity-invertibility domain of ARMA models. It turns out that once an order is fixed among the variables of a general random vector, then the above properties continue to hold and follow from establishing a one-to-one correspondence between a correlation matrix and its associated matrix of partial autocorrelations. Connections between the latter and the parameters of the modified Cholesky decomposition of a covariance matrix are discussed. Graphical tools similar to partial correlograms for model formulation and various priors based on the partial autocorrelations are proposed. We develop frequentist/Bayesian procedures for modelling correlation matrices, illustrate them using a real dataset, and explore their properties via simulations

    Impact of Serial Correlation Misspecification with the Linear Mixed Model

    Get PDF
    Linear mixed models are popular models for use with clustered and longitudinal data due to their ability to model variation at different levels of clustering. A Monte Carlo study was used to explore the impact of assumption violations on the bias of parameter estimates and the empirical type I error rates. Simulated conditions included in this study are: simulated serial correlation structure, fitted serial correlation structure, random effect distribution, cluster sample size, and number of measurement occasions. Results showed that the fixed effects are unbiased, but the random components tend to be overestimated and the empirical Type I error rates tend to be inflated. Implications for applied researchers were discussed

    An Empirical Investigation of Labor Income Processes

    Get PDF
    In this paper we reassess the evidence on labor income risk. There are two leading views on the nature of the income process in the current literature. The first view, which we call the "Restricted Income Profiles" (RIP) process, holds that individuals are subject to large and very persistent shocks, while facing similar life-cycle income profiles. The alternative view, which we call the "Heterogeneous Income Profiles" (HIP) process, holds that individuals are subject to income shocks with modest persistence, while facing individual-specific income profiles.We first show that ignoring profile heterogeneity, when in fact it is present, introduces an upward bias into the estimates of persistence. Second, we estimate a parsimonious parameterization of the HIP process that is suitable for calibrating economic models. The estimated persistence is about 0.8 in the HIP process compared to about 0.99 in the RIP process. Moreover, the heterogeneity in income profiles is estimated to be substantial, explaining between 56 to 75 percent of income inequality at age 55. We also find that profile heterogeneity is substantially larger among higher educated individuals. Third, we discuss the source of identification -- in other words, the aspects of labor income data that allow one to distinguish between the HIP and RIP processes. Finally, we show that the main evidence against profile heterogeneity in the existing literature -- that the autocorrelations of income changes are small and negative -- is also replicated by the HIP process, suggesting that this evidence may have been misinterpreted.

    An empirical investigation of labor income processes

    Get PDF
    In this paper we reassess the evidence on labor income risk. There are two leading views on the nature of the income process in the current literature. The first view, which we call the "Restricted Income Profiles" RIP process, holds that individuals are subject to large and very persistent shocks, while facing similar life-cycle income profiles. The alternative view, which we call the "Heterogeneous Income Profiles" HIP process, holds that individuals are subject to income shocks with modest persistence, while facing individual-specific income profiles. We first show that ignoring profile heterogeneity, when in fact it is present, introduces an upward bias into the estimates of persistence. Second, we estimate a parsimonious parameterization of the HIP process that is suitable for calibrating economic models. The estimated persistence is about 0.8 in the HIP process compared to about 0.99 in the RIP process. Moreover, the heterogeneity in income profiles is estimated to be substantial, explaining between 56 to 75 percent of income inequality at age 55. We also find that profile heterogeneity is substantially larger among higher educated individuals. Third, we discuss the source of identification - in other words, the aspects of labor income data that allow one to distinguish between the HIP and RIP processes. Finally, we show that the main evidence against profile heterogeneity in the existing literature - that the autocorrelations of income changes are small and negative - is also replicated by the HIP process, suggesting that this evidence may have been misinterpreted.
    • …
    corecore