352 research outputs found
Modeling covariance matrices via partial autocorrelations
AbstractWe study the role of partial autocorrelations in the reparameterization and parsimonious modeling of a covariance matrix. The work is motivated by and tries to mimic the phenomenal success of the partial autocorrelations function (PACF) in model formulation, removing the positive-definiteness constraint on the autocorrelation function of a stationary time series and in reparameterizing the stationarity-invertibility domain of ARMA models. It turns out that once an order is fixed among the variables of a general random vector, then the above properties continue to hold and follow from establishing a one-to-one correspondence between a correlation matrix and its associated matrix of partial autocorrelations. Connections between the latter and the parameters of the modified Cholesky decomposition of a covariance matrix are discussed. Graphical tools similar to partial correlograms for model formulation and various priors based on the partial autocorrelations are proposed. We develop frequentist/Bayesian procedures for modelling correlation matrices, illustrate them using a real dataset, and explore their properties via simulations
Covariance Estimation: The GLM and Regularization Perspectives
Finding an unconstrained and statistically interpretable reparameterization
of a covariance matrix is still an open problem in statistics. Its solution is
of central importance in covariance estimation, particularly in the recent
high-dimensional data environment where enforcing the positive-definiteness
constraint could be computationally expensive. We provide a survey of the
progress made in modeling covariance matrices from two relatively complementary
perspectives: (1) generalized linear models (GLM) or parsimony and use of
covariates in low dimensions, and (2) regularization or sparsity for
high-dimensional data. An emerging, unifying and powerful trend in both
perspectives is that of reducing a covariance estimation problem to that of
estimating a sequence of regression problems. We point out several instances of
the regression-based formulation. A notable case is in sparse estimation of a
precision matrix or a Gaussian graphical model leading to the fast graphical
LASSO algorithm. Some advantages and limitations of the regression-based
Cholesky decomposition relative to the classical spectral (eigenvalue) and
variance-correlation decompositions are highlighted. The former provides an
unconstrained and statistically interpretable reparameterization, and
guarantees the positive-definiteness of the estimated covariance matrix. It
reduces the unintuitive task of covariance estimation to that of modeling a
sequence of regressions at the cost of imposing an a priori order among the
variables. Elementwise regularization of the sample covariance matrix such as
banding, tapering and thresholding has desirable asymptotic properties and the
sparse estimated covariance matrix is positive definite with probability
tending to one for large samples and dimensions.Comment: Published in at http://dx.doi.org/10.1214/11-STS358 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Generating random AR(p) and MA(q) Toeplitz correlation matrices
AbstractMethods are proposed for generating random (p+1)×(p+1) Toeplitz correlation matrices that are consistent with a causal AR(p) Gaussian time series model. The main idea is to first specify distributions for the partial autocorrelations that are algebraically independent and take values in (−1,1), and then map to the Toeplitz matrix. Similarly, starting with pseudo-partial autocorrelations, methods are proposed for generating (q+1)×(q+1) Toeplitz correlation matrices that are consistent with an invertible MA(q) Gaussian time series model. The density can be uniform or non-uniform over the space of autocorrelations up to lag p or q, or over the space of autoregressive or moving average coefficients, by making appropriate choices for the densities of the (pseudo)-partial autocorrelations. Important intermediate steps are the derivations of the Jacobians of the mappings between the (pseudo)-partial autocorrelations, autocorrelations and autoregressive/moving average coefficients. The random generating methods are useful for models with a structured Toeplitz matrix as a parameter
Computational Bayesian Methods Applied to Complex Problems in Bio and Astro Statistics
In this dissertation we apply computational Bayesian methods to three distinct problems. In the first chapter, we address the issue of unrealistic covariance matrices used to estimate collision probabilities. We model covariance matrices with a Bayesian Normal-Inverse-Wishart model, which we fit with Gibbs sampling. In the second chapter, we are interested in determining the sample sizes necessary to achieve a particular interval width and establish non-inferiority in the analysis of prevalences using two fallible tests. To this end, we use a third order asymptotic approximation. In the third chapter, we wish to synthesize evidence across multiple domains in measurements taken longitudinally across time, featuring a substantial amount of structurally missing data, and fit the model with Hamiltonian Monte Carlo in a simulation to analyze how estimates of a parameter of interest change across sample sizes
Bayesian Spatial Binary Regression for Label Fusion in Structural Neuroimaging
Many analyses of neuroimaging data involve studying one or more regions of
interest (ROIs) in a brain image. In order to do so, each ROI must first be
identified. Since every brain is unique, the location, size, and shape of each
ROI varies across subjects. Thus, each ROI in a brain image must either be
manually identified or (semi-) automatically delineated, a task referred to as
segmentation. Automatic segmentation often involves mapping a previously
manually segmented image to a new brain image and propagating the labels to
obtain an estimate of where each ROI is located in the new image. A more recent
approach to this problem is to propagate labels from multiple manually
segmented atlases and combine the results using a process known as label
fusion. To date, most label fusion algorithms either employ voting procedures
or impose prior structure and subsequently find the maximum a posteriori
estimator (i.e., the posterior mode) through optimization. We propose using a
fully Bayesian spatial regression model for label fusion that facilitates
direct incorporation of covariate information while making accessible the
entire posterior distribution. We discuss the implementation of our model via
Markov chain Monte Carlo and illustrate the procedure through both simulation
and application to segmentation of the hippocampus, an anatomical structure
known to be associated with Alzheimer's disease.Comment: 24 pages, 10 figure
Contribuições ao estudo de dados longitudinais na teoria de resposta ao item
Orientador: Caio Lucidius Naberezny AzevedoTese (doutorado) - Universidade Estadual de Campinas, Instituto de Matemática EstatÃstica e Computação CientÃficaResumo: Na presente tese desenvolvemos classes de modelos longitudinais da Teoria de Resposta o Item (TRI) considerando duas abordagens. A primeira é baseada na decomposição de Cholesky de matrizes de covariância de interesse, relacionadas aos traços latentes. Essa metodologia permite representar um amplo conjunto de estruturas de dependência de maneira relativamente simples, facilita a escolha de distribuições a priori para os parâmetros relacionados à estrutura de dependência, facilita a implementação de algoritmos de estimação (particularmente sob o enfoque Bayesiano), permite considerar diferentes distribuições (multivariadas) para os traços latentes de modo simples, torna bastante fácil a incorporação de estruturas de regressão para os traços latentes, entre outras vantagens. Desenvolvemos, adicionalmente, uma classe de modelos com estruturas de curvas de crescimento para os traços latentes. Na segunda abordagem utilizamos cópulas Gaussianas para representar a estrutura de dependência dos traços latentes. Diferentemente da abordagem anterior, essa metodologia permite o total controle das respectivas distribuições marginais mas, igualmente, permite considerar um grande número de estruturas de dependência. Utilizamos modelos dicotômicos de resposta ao item e exploramos a utilização da distribuição normal e normal assimétrica para os traços latentes. Consideramos indivÃduos acompanhados ao longo de várias condições de avaliação, submetidos a instrumentos de medida em cada uma delas, os quais possuem alguma estrutura de itens comuns. Exploramos os casos de um único e de vários grupos como também dados balanceados e desbalanceados, no sentido de considerarmos inclusão e exclusão de indivÃduos ao longo do tempo. Algoritmos de estimação, ferramentas para verificação da qualidade de ajuste e comparação de modelos foram desenvolvidos sob o paradigma bayesiano, através de algoritmos MCMC hÃbridos, nos quais os algoritmos SVE (Single Variable Exchange) e Metropolis-Hastings são considerados quando as distribuições condicionais completas não são conhecidas. Estudos de simulação foram conduzidos, os quais indicaram que os parâmetros foram bem recuperados. Além disso, dois conjuntos de dados longitudinais psicométricos foram analisados para ilustrar as metodologias desenvolvidas. O primeiro é parte de um estudo de avaliação educacional em larga escala promovido pelo governo federal brasileiro. O segundo foi extraÃdo do Amsterdam Growth and Health Longitudinal Study (AGHLS) que monitora a saúde e o estilo de vida de adolescentes holandesesAbstract: In this thesis we developed families of longitudinal Item Response Theory (IRT) models considering two approaches. The first one is based on the Cholesky decomposition of the covariance matrices of interest, related to the latent traits. This modeling can accommodate several dependence structures in a easy way, it facilitates the choice of prior distributions for the parameters of the dependence matrix, it facilitates the implementation of estimation algorithms (particularly under the Bayesian paradigm), it allows to consider different (multivariate) distributions for the latent traits, it makes easier the inclusion of regression and multilevel structures for the latent traits, among other advantages. Additionally, we developed growth curve models for the latent traits. The second one uses a Gaussian copula function to describes the latent trait structure. Differently from the first one, the copula approach allows the entire control of the respective marginal latent trait distributions, but as the first one, it accommodates several dependence structures. We focus on dichotomous responses and explore the use of the normal and skew-normal distributions for the latent traits. We consider subjects followed over several evaluation conditions (time-points) submitted to measurement instruments which have some structure of common items. Such subjects can belong to a single or multiple independent groups and also we considered both balanced and unbalanced data, in the sense that inclusion or dropouts of subjects are allowed. Estimation algorithms, model fit assessment and model comparison tools were developed under the Bayesian paradigm through hybrid MCMC algorithms, such that when the full conditionals are not known, the SVE (Single Variable Exchange) and Metropolis-Hastings algorithms are used. Simulation studies indicate that the parameters are well recovered. Furthermore, two longitudinal psychometrical data sets were analyzed to illustrate our methodologies. The first one is a large-scale longitudinal educational study conducted by the Brazilian federal government. The second was extracted from the Amsterdam Growth and Health Longitudinal Study (AGHLS), which monitors the health and life-style of Dutch teenagersDoutoradoEstatisticaDoutor em EstatÃstica162562/2014-4,142486/2015-9CNPQCAPE
Bayesian modeling of the covariance structure for irregular longitudinal data using the partial autocorrelation function
In long-term follow-up studies, irregular longitudinal data are observed when individuals are assessed repeatedly over time but at uncommon and irregularly spaced time points. Modeling the covariance structure for this type of data is challenging, as it requires specification of a covariance function that is positive definite. Moreover, in certain settings, careful modeling of the covariance structure for irregular longitudinal data can be crucial in order to ensure no bias arises in the mean structure. Two common settings where this occurs are studies with ‘outcome-dependent follow-up’ and studies with ‘ignorable missing data’. ‘Outcome-dependent follow-up’ occurs when individuals with a history of poor health outcomes had more follow-up measurements, and the intervals between the repeated measurements were shorter. When the follow-up time process only depends on previous outcomes, likelihood-based methods can still provide consistent estimates of the regression parameters, given that both the mean and covariance structures of the irregular longitudinal data are correctly specified and no model for the follow-up time process is required. For ‘ignorable missing data’, the missing data mechanism does not need to be specified, but valid likelihood-based inference requires correct specification of the covariance structure. In both cases, flexible modeling approaches for the covariance structure are essential. In this paper, we develop a flexible approach to modeling the covariance structure for irregular continuous longitudinal data using the partial autocorrelation function and the variance function. In particular, we propose semiparametric non-stationary partial autocorrelation function models, which do not suffer from complex positive definiteness restrictions like the autocorrelation function. We describe a Bayesian approach, discuss computational issues, and apply the proposed methods to CD4 count data from a pediatric AIDS clinical trial. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd
Handling Attrition in Longitudinal Studies: The Case for Refreshment Samples
Panel studies typically suffer from attrition, which reduces sample size and
can result in biased inferences. It is impossible to know whether or not the
attrition causes bias from the observed panel data alone. Refreshment samples -
new, randomly sampled respondents given the questionnaire at the same time as a
subsequent wave of the panel - offer information that can be used to diagnose
and adjust for bias due to attrition. We review and bolster the case for the
use of refreshment samples in panel studies. We include examples of both a
fully Bayesian approach for analyzing the concatenated panel and refreshment
data, and a multiple imputation approach for analyzing only the original panel.
For the latter, we document a positive bias in the usual multiple imputation
variance estimator. We present models appropriate for three waves and two
refreshment samples, including nonterminal attrition. We illustrate the
three-wave analysis using the 2007-2008 Associated Press-Yahoo! News Election
Poll.Comment: Published in at http://dx.doi.org/10.1214/13-STS414 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
- …