352 research outputs found

    Modeling covariance matrices via partial autocorrelations

    Get PDF
    AbstractWe study the role of partial autocorrelations in the reparameterization and parsimonious modeling of a covariance matrix. The work is motivated by and tries to mimic the phenomenal success of the partial autocorrelations function (PACF) in model formulation, removing the positive-definiteness constraint on the autocorrelation function of a stationary time series and in reparameterizing the stationarity-invertibility domain of ARMA models. It turns out that once an order is fixed among the variables of a general random vector, then the above properties continue to hold and follow from establishing a one-to-one correspondence between a correlation matrix and its associated matrix of partial autocorrelations. Connections between the latter and the parameters of the modified Cholesky decomposition of a covariance matrix are discussed. Graphical tools similar to partial correlograms for model formulation and various priors based on the partial autocorrelations are proposed. We develop frequentist/Bayesian procedures for modelling correlation matrices, illustrate them using a real dataset, and explore their properties via simulations

    Covariance Estimation: The GLM and Regularization Perspectives

    Get PDF
    Finding an unconstrained and statistically interpretable reparameterization of a covariance matrix is still an open problem in statistics. Its solution is of central importance in covariance estimation, particularly in the recent high-dimensional data environment where enforcing the positive-definiteness constraint could be computationally expensive. We provide a survey of the progress made in modeling covariance matrices from two relatively complementary perspectives: (1) generalized linear models (GLM) or parsimony and use of covariates in low dimensions, and (2) regularization or sparsity for high-dimensional data. An emerging, unifying and powerful trend in both perspectives is that of reducing a covariance estimation problem to that of estimating a sequence of regression problems. We point out several instances of the regression-based formulation. A notable case is in sparse estimation of a precision matrix or a Gaussian graphical model leading to the fast graphical LASSO algorithm. Some advantages and limitations of the regression-based Cholesky decomposition relative to the classical spectral (eigenvalue) and variance-correlation decompositions are highlighted. The former provides an unconstrained and statistically interpretable reparameterization, and guarantees the positive-definiteness of the estimated covariance matrix. It reduces the unintuitive task of covariance estimation to that of modeling a sequence of regressions at the cost of imposing an a priori order among the variables. Elementwise regularization of the sample covariance matrix such as banding, tapering and thresholding has desirable asymptotic properties and the sparse estimated covariance matrix is positive definite with probability tending to one for large samples and dimensions.Comment: Published in at http://dx.doi.org/10.1214/11-STS358 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Generating random AR(p) and MA(q) Toeplitz correlation matrices

    Get PDF
    AbstractMethods are proposed for generating random (p+1)×(p+1) Toeplitz correlation matrices that are consistent with a causal AR(p) Gaussian time series model. The main idea is to first specify distributions for the partial autocorrelations that are algebraically independent and take values in (−1,1), and then map to the Toeplitz matrix. Similarly, starting with pseudo-partial autocorrelations, methods are proposed for generating (q+1)×(q+1) Toeplitz correlation matrices that are consistent with an invertible MA(q) Gaussian time series model. The density can be uniform or non-uniform over the space of autocorrelations up to lag p or q, or over the space of autoregressive or moving average coefficients, by making appropriate choices for the densities of the (pseudo)-partial autocorrelations. Important intermediate steps are the derivations of the Jacobians of the mappings between the (pseudo)-partial autocorrelations, autocorrelations and autoregressive/moving average coefficients. The random generating methods are useful for models with a structured Toeplitz matrix as a parameter

    Computational Bayesian Methods Applied to Complex Problems in Bio and Astro Statistics

    Get PDF
    In this dissertation we apply computational Bayesian methods to three distinct problems. In the first chapter, we address the issue of unrealistic covariance matrices used to estimate collision probabilities. We model covariance matrices with a Bayesian Normal-Inverse-Wishart model, which we fit with Gibbs sampling. In the second chapter, we are interested in determining the sample sizes necessary to achieve a particular interval width and establish non-inferiority in the analysis of prevalences using two fallible tests. To this end, we use a third order asymptotic approximation. In the third chapter, we wish to synthesize evidence across multiple domains in measurements taken longitudinally across time, featuring a substantial amount of structurally missing data, and fit the model with Hamiltonian Monte Carlo in a simulation to analyze how estimates of a parameter of interest change across sample sizes

    Bayesian Spatial Binary Regression for Label Fusion in Structural Neuroimaging

    Full text link
    Many analyses of neuroimaging data involve studying one or more regions of interest (ROIs) in a brain image. In order to do so, each ROI must first be identified. Since every brain is unique, the location, size, and shape of each ROI varies across subjects. Thus, each ROI in a brain image must either be manually identified or (semi-) automatically delineated, a task referred to as segmentation. Automatic segmentation often involves mapping a previously manually segmented image to a new brain image and propagating the labels to obtain an estimate of where each ROI is located in the new image. A more recent approach to this problem is to propagate labels from multiple manually segmented atlases and combine the results using a process known as label fusion. To date, most label fusion algorithms either employ voting procedures or impose prior structure and subsequently find the maximum a posteriori estimator (i.e., the posterior mode) through optimization. We propose using a fully Bayesian spatial regression model for label fusion that facilitates direct incorporation of covariate information while making accessible the entire posterior distribution. We discuss the implementation of our model via Markov chain Monte Carlo and illustrate the procedure through both simulation and application to segmentation of the hippocampus, an anatomical structure known to be associated with Alzheimer's disease.Comment: 24 pages, 10 figure

    Contribuições ao estudo de dados longitudinais na teoria de resposta ao item

    Get PDF
    Orientador: Caio Lucidius Naberezny AzevedoTese (doutorado) - Universidade Estadual de Campinas, Instituto de Matemática Estatística e Computação CientíficaResumo: Na presente tese desenvolvemos classes de modelos longitudinais da Teoria de Resposta o Item (TRI) considerando duas abordagens. A primeira é baseada na decomposição de Cholesky de matrizes de covariância de interesse, relacionadas aos traços latentes. Essa metodologia permite representar um amplo conjunto de estruturas de dependência de maneira relativamente simples, facilita a escolha de distribuições a priori para os parâmetros relacionados à estrutura de dependência, facilita a implementação de algoritmos de estimação (particularmente sob o enfoque Bayesiano), permite considerar diferentes distribuições (multivariadas) para os traços latentes de modo simples, torna bastante fácil a incorporação de estruturas de regressão para os traços latentes, entre outras vantagens. Desenvolvemos, adicionalmente, uma classe de modelos com estruturas de curvas de crescimento para os traços latentes. Na segunda abordagem utilizamos cópulas Gaussianas para representar a estrutura de dependência dos traços latentes. Diferentemente da abordagem anterior, essa metodologia permite o total controle das respectivas distribuições marginais mas, igualmente, permite considerar um grande número de estruturas de dependência. Utilizamos modelos dicotômicos de resposta ao item e exploramos a utilização da distribuição normal e normal assimétrica para os traços latentes. Consideramos indivíduos acompanhados ao longo de várias condições de avaliação, submetidos a instrumentos de medida em cada uma delas, os quais possuem alguma estrutura de itens comuns. Exploramos os casos de um único e de vários grupos como também dados balanceados e desbalanceados, no sentido de considerarmos inclusão e exclusão de indivíduos ao longo do tempo. Algoritmos de estimação, ferramentas para verificação da qualidade de ajuste e comparação de modelos foram desenvolvidos sob o paradigma bayesiano, através de algoritmos MCMC híbridos, nos quais os algoritmos SVE (Single Variable Exchange) e Metropolis-Hastings são considerados quando as distribuições condicionais completas não são conhecidas. Estudos de simulação foram conduzidos, os quais indicaram que os parâmetros foram bem recuperados. Além disso, dois conjuntos de dados longitudinais psicométricos foram analisados para ilustrar as metodologias desenvolvidas. O primeiro é parte de um estudo de avaliação educacional em larga escala promovido pelo governo federal brasileiro. O segundo foi extraído do Amsterdam Growth and Health Longitudinal Study (AGHLS) que monitora a saúde e o estilo de vida de adolescentes holandesesAbstract: In this thesis we developed families of longitudinal Item Response Theory (IRT) models considering two approaches. The first one is based on the Cholesky decomposition of the covariance matrices of interest, related to the latent traits. This modeling can accommodate several dependence structures in a easy way, it facilitates the choice of prior distributions for the parameters of the dependence matrix, it facilitates the implementation of estimation algorithms (particularly under the Bayesian paradigm), it allows to consider different (multivariate) distributions for the latent traits, it makes easier the inclusion of regression and multilevel structures for the latent traits, among other advantages. Additionally, we developed growth curve models for the latent traits. The second one uses a Gaussian copula function to describes the latent trait structure. Differently from the first one, the copula approach allows the entire control of the respective marginal latent trait distributions, but as the first one, it accommodates several dependence structures. We focus on dichotomous responses and explore the use of the normal and skew-normal distributions for the latent traits. We consider subjects followed over several evaluation conditions (time-points) submitted to measurement instruments which have some structure of common items. Such subjects can belong to a single or multiple independent groups and also we considered both balanced and unbalanced data, in the sense that inclusion or dropouts of subjects are allowed. Estimation algorithms, model fit assessment and model comparison tools were developed under the Bayesian paradigm through hybrid MCMC algorithms, such that when the full conditionals are not known, the SVE (Single Variable Exchange) and Metropolis-Hastings algorithms are used. Simulation studies indicate that the parameters are well recovered. Furthermore, two longitudinal psychometrical data sets were analyzed to illustrate our methodologies. The first one is a large-scale longitudinal educational study conducted by the Brazilian federal government. The second was extracted from the Amsterdam Growth and Health Longitudinal Study (AGHLS), which monitors the health and life-style of Dutch teenagersDoutoradoEstatisticaDoutor em Estatística162562/2014-4,142486/2015-9CNPQCAPE

    Bayesian modeling of the covariance structure for irregular longitudinal data using the partial autocorrelation function

    Get PDF
    In long-term follow-up studies, irregular longitudinal data are observed when individuals are assessed repeatedly over time but at uncommon and irregularly spaced time points. Modeling the covariance structure for this type of data is challenging, as it requires specification of a covariance function that is positive definite. Moreover, in certain settings, careful modeling of the covariance structure for irregular longitudinal data can be crucial in order to ensure no bias arises in the mean structure. Two common settings where this occurs are studies with ‘outcome-dependent follow-up’ and studies with ‘ignorable missing data’. ‘Outcome-dependent follow-up’ occurs when individuals with a history of poor health outcomes had more follow-up measurements, and the intervals between the repeated measurements were shorter. When the follow-up time process only depends on previous outcomes, likelihood-based methods can still provide consistent estimates of the regression parameters, given that both the mean and covariance structures of the irregular longitudinal data are correctly specified and no model for the follow-up time process is required. For ‘ignorable missing data’, the missing data mechanism does not need to be specified, but valid likelihood-based inference requires correct specification of the covariance structure. In both cases, flexible modeling approaches for the covariance structure are essential. In this paper, we develop a flexible approach to modeling the covariance structure for irregular continuous longitudinal data using the partial autocorrelation function and the variance function. In particular, we propose semiparametric non-stationary partial autocorrelation function models, which do not suffer from complex positive definiteness restrictions like the autocorrelation function. We describe a Bayesian approach, discuss computational issues, and apply the proposed methods to CD4 count data from a pediatric AIDS clinical trial. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd

    Handling Attrition in Longitudinal Studies: The Case for Refreshment Samples

    Get PDF
    Panel studies typically suffer from attrition, which reduces sample size and can result in biased inferences. It is impossible to know whether or not the attrition causes bias from the observed panel data alone. Refreshment samples - new, randomly sampled respondents given the questionnaire at the same time as a subsequent wave of the panel - offer information that can be used to diagnose and adjust for bias due to attrition. We review and bolster the case for the use of refreshment samples in panel studies. We include examples of both a fully Bayesian approach for analyzing the concatenated panel and refreshment data, and a multiple imputation approach for analyzing only the original panel. For the latter, we document a positive bias in the usual multiple imputation variance estimator. We present models appropriate for three waves and two refreshment samples, including nonterminal attrition. We illustrate the three-wave analysis using the 2007-2008 Associated Press-Yahoo! News Election Poll.Comment: Published in at http://dx.doi.org/10.1214/13-STS414 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore