15,524 research outputs found
Multivariate Covariance Generalized Linear Models
We propose a general framework for non-normal multivariate data analysis
called multivariate covariance generalized linear models (McGLMs), designed to
handle multivariate response variables, along with a wide range of temporal and
spatial correlation structures defined in terms of a covariance link function
combined with a matrix linear predictor involving known matrices. The method is
motivated by three data examples that are not easily handled by existing
methods. The first example concerns multivariate count data, the second
involves response variables of mixed types, combined with repeated measures and
longitudinal structures, and the third involves a spatio-temporal analysis of
rainfall data. The models take non-normality into account in the conventional
way by means of a variance function, and the mean structure is modelled by
means of a link function and a linear predictor. The models are fitted using an
efficient Newton scoring algorithm based on quasi-likelihood and Pearson
estimating functions, using only second-moment assumptions. This provides a
unified approach to a wide variety of different types of response variables and
covariance structures, including multivariate extensions of repeated measures,
time series, longitudinal, spatial and spatio-temporal structures.Comment: 21 pages, 5 figure
Bayesian nonparametric models for spatially indexed data of mixed type
We develop Bayesian nonparametric models for spatially indexed data of mixed
type. Our work is motivated by challenges that occur in environmental
epidemiology, where the usual presence of several confounding variables that
exhibit complex interactions and high correlations makes it difficult to
estimate and understand the effects of risk factors on health outcomes of
interest. The modeling approach we adopt assumes that responses and confounding
variables are manifestations of continuous latent variables, and uses
multivariate Gaussians to jointly model these. Responses and confounding
variables are not treated equally as relevant parameters of the distributions
of the responses only are modeled in terms of explanatory variables or risk
factors. Spatial dependence is introduced by allowing the weights of the
nonparametric process priors to be location specific, obtained as probit
transformations of Gaussian Markov random fields. Confounding variables and
spatial configuration have a similar role in the model, in that they only
influence, along with the responses, the allocation probabilities of the areas
into the mixture components, thereby allowing for flexible adjustment of the
effects of observed confounders, while allowing for the possibility of residual
spatial structure, possibly occurring due to unmeasured or undiscovered
spatially varying factors. Aspects of the model are illustrated in simulation
studies and an application to a real data set
Foundational principles for large scale inference: Illustrations through correlation mining
When can reliable inference be drawn in the "Big Data" context? This paper
presents a framework for answering this fundamental question in the context of
correlation mining, with implications for general large scale inference. In
large scale data applications like genomics, connectomics, and eco-informatics
the dataset is often variable-rich but sample-starved: a regime where the
number of acquired samples (statistical replicates) is far fewer than the
number of observed variables (genes, neurons, voxels, or chemical
constituents). Much of recent work has focused on understanding the
computational complexity of proposed methods for "Big Data." Sample complexity
however has received relatively less attention, especially in the setting when
the sample size is fixed, and the dimension grows without bound. To
address this gap, we develop a unified statistical framework that explicitly
quantifies the sample complexity of various inferential tasks. Sampling regimes
can be divided into several categories: 1) the classical asymptotic regime
where the variable dimension is fixed and the sample size goes to infinity; 2)
the mixed asymptotic regime where both variable dimension and sample size go to
infinity at comparable rates; 3) the purely high dimensional asymptotic regime
where the variable dimension goes to infinity and the sample size is fixed.
Each regime has its niche but only the latter regime applies to exa-scale data
dimension. We illustrate this high dimensional framework for the problem of
correlation mining, where it is the matrix of pairwise and partial correlations
among the variables that are of interest. We demonstrate various regimes of
correlation mining based on the unifying perspective of high dimensional
learning rates and sample complexity for different structured covariance models
and different inference tasks
Convolved Gaussian process regression models for multivariate non-Gaussian data
PhD ThesisMultivariate regression analysis has been developed rapidly in the last decade for dependent
data. The most di cult part in multivariate cases is how to construct a crosscorrelation
between response variables. We need to make sure that the covariance matrix
is positive de nite which is not an easy task. Several approaches have been developed to
overcome the issue. However, most of them have some limitations, such as it is hard to
extend it to the case involving high dimensional variables or capture individual characteristics.
It also should point out that the meaning of the cross-correlation structure for
some methods is unclear. To address the issues, we propose to use convolved Gaussian
process (CGP) priors (Boyle & Frean, 2005).
In this dissertation, we propose a novel approach for multivariate regression using CGP
priors. The approach provides a semiparametric model with multi-dimensional covariates
and o ers a natural framework for modelling common mean structures and covariance
structures simultaneously for multivariate dependent data. Information about observations
is provided by the common mean structure while individual characteristics also can
be captured by the covariance structure. At the same time, the covariance function is able
to accommodate a large-dimensional covariate as well.
We start to make a broader problem from a general framework of CGP proposed by
Andriluka et al. (2006). We investigate some of the stationary covariance functions and
the mixed forms for constructing multiple dependent Gaussian processes to solve a more
complex issue. Then, we extend the idea to a multivariate non-linear regression model by
using convolved Gaussian processes as priors.
We then focus on an applying the idea to multivariate non-Gaussian data, i.e. multivariate
Poisson, and other multivariate non-Gaussian distributions from the exponential
family. We start our focus on multivariate Poisson data which are found in many problems
relating to public health issues. Then nally, we provide a general framework for a
multivariate binomial data and other multivariate non-Gaussian data.
The de nition of the model, the inference, and the implementation, as well as its
asymptotic properties, are discussed. Comprehensive numerical examples with both simulation
studies and real data are presented.Ministry of Education, Indonesi
- …