15,524 research outputs found

    Multivariate Covariance Generalized Linear Models

    Full text link
    We propose a general framework for non-normal multivariate data analysis called multivariate covariance generalized linear models (McGLMs), designed to handle multivariate response variables, along with a wide range of temporal and spatial correlation structures defined in terms of a covariance link function combined with a matrix linear predictor involving known matrices. The method is motivated by three data examples that are not easily handled by existing methods. The first example concerns multivariate count data, the second involves response variables of mixed types, combined with repeated measures and longitudinal structures, and the third involves a spatio-temporal analysis of rainfall data. The models take non-normality into account in the conventional way by means of a variance function, and the mean structure is modelled by means of a link function and a linear predictor. The models are fitted using an efficient Newton scoring algorithm based on quasi-likelihood and Pearson estimating functions, using only second-moment assumptions. This provides a unified approach to a wide variety of different types of response variables and covariance structures, including multivariate extensions of repeated measures, time series, longitudinal, spatial and spatio-temporal structures.Comment: 21 pages, 5 figure

    Bayesian nonparametric models for spatially indexed data of mixed type

    Get PDF
    We develop Bayesian nonparametric models for spatially indexed data of mixed type. Our work is motivated by challenges that occur in environmental epidemiology, where the usual presence of several confounding variables that exhibit complex interactions and high correlations makes it difficult to estimate and understand the effects of risk factors on health outcomes of interest. The modeling approach we adopt assumes that responses and confounding variables are manifestations of continuous latent variables, and uses multivariate Gaussians to jointly model these. Responses and confounding variables are not treated equally as relevant parameters of the distributions of the responses only are modeled in terms of explanatory variables or risk factors. Spatial dependence is introduced by allowing the weights of the nonparametric process priors to be location specific, obtained as probit transformations of Gaussian Markov random fields. Confounding variables and spatial configuration have a similar role in the model, in that they only influence, along with the responses, the allocation probabilities of the areas into the mixture components, thereby allowing for flexible adjustment of the effects of observed confounders, while allowing for the possibility of residual spatial structure, possibly occurring due to unmeasured or undiscovered spatially varying factors. Aspects of the model are illustrated in simulation studies and an application to a real data set

    Foundational principles for large scale inference: Illustrations through correlation mining

    Full text link
    When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number nn of acquired samples (statistical replicates) is far fewer than the number pp of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data." Sample complexity however has received relatively less attention, especially in the setting when the sample size nn is fixed, and the dimension pp grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. We demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks

    Convolved Gaussian process regression models for multivariate non-Gaussian data

    Get PDF
    PhD ThesisMultivariate regression analysis has been developed rapidly in the last decade for dependent data. The most di cult part in multivariate cases is how to construct a crosscorrelation between response variables. We need to make sure that the covariance matrix is positive de nite which is not an easy task. Several approaches have been developed to overcome the issue. However, most of them have some limitations, such as it is hard to extend it to the case involving high dimensional variables or capture individual characteristics. It also should point out that the meaning of the cross-correlation structure for some methods is unclear. To address the issues, we propose to use convolved Gaussian process (CGP) priors (Boyle & Frean, 2005). In this dissertation, we propose a novel approach for multivariate regression using CGP priors. The approach provides a semiparametric model with multi-dimensional covariates and o ers a natural framework for modelling common mean structures and covariance structures simultaneously for multivariate dependent data. Information about observations is provided by the common mean structure while individual characteristics also can be captured by the covariance structure. At the same time, the covariance function is able to accommodate a large-dimensional covariate as well. We start to make a broader problem from a general framework of CGP proposed by Andriluka et al. (2006). We investigate some of the stationary covariance functions and the mixed forms for constructing multiple dependent Gaussian processes to solve a more complex issue. Then, we extend the idea to a multivariate non-linear regression model by using convolved Gaussian processes as priors. We then focus on an applying the idea to multivariate non-Gaussian data, i.e. multivariate Poisson, and other multivariate non-Gaussian distributions from the exponential family. We start our focus on multivariate Poisson data which are found in many problems relating to public health issues. Then nally, we provide a general framework for a multivariate binomial data and other multivariate non-Gaussian data. The de nition of the model, the inference, and the implementation, as well as its asymptotic properties, are discussed. Comprehensive numerical examples with both simulation studies and real data are presented.Ministry of Education, Indonesi
    corecore