51 research outputs found
Regression with I-priors
The problem of estimating a parametric or nonparametric regression function in a model with normal errors is considered. For this purpose, a novel objective prior for the regression function is proposed, defined as the distribution maximizing entropy subject to a suitable constraint based on the Fisher information on the regression function. The prior is named I-prior. For the present model, it is Gaussian with covariance kernel proportional to the Fisher information, and mean chosen a priori (e.g., 0). The I-prior has the intuitively appealing property that the more information is available about a linear functional of the regression function, the larger its prior variance, and, broadly speaking, the less influential the prior is on the posterior. Unlike the Jeffreys prior, it can be used in high dimensional settings. The I-prior methodology can be used as a principled alternative to Tikhonov regularization, which suffers from well-known theoretical problems which are briefly reviewed. The regression function is assumed to lie in a reproducing kernel Hilbert space (RKHS) over a low or high dimensional covariate space, giving a high degree of generality. Analysis of some real data sets and a small-scale simulation study show competitive performance of the I-prior methodology, which is implemented in the R-package iprior
A Kernel Test for Three-Variable Interactions
We introduce kernel nonparametric tests for Lancaster three-variable
interaction and for total independence, using embeddings of signed measures
into a reproducing kernel Hilbert space. The resulting test statistics are
straightforward to compute, and are used in powerful interaction tests, which
are consistent against all alternatives for a large family of reproducing
kernels. We show the Lancaster test to be sensitive to cases where two
independent causes individually have weak influence on a third dependent
variable, but their combined effect has a strong influence. This makes the
Lancaster test especially suited to finding structure in directed graphical
models, where it outperforms competing nonparametric tests in detecting such
V-structures
Regression modelling with I-priors
We introduce the I-prior methodology as a unifying framework for estimating a
variety of regression models, including varying coefficient, multilevel,
longitudinal models, and models with functional covariates and responses. It
can also be used for multi-class classification, with low or high dimensional
covariates.
The I-prior is generally defined as a maximum entropy prior. For a regression
function, the I-prior is Gaussian with covariance kernel proportional to the
Fisher information on the regression function, which is estimated by its
posterior distribution under the I-prior. The I-prior has the intuitively
appealing property that the more information is available on a linear
functional of the regression function, the larger the prior variance, and the
smaller the influence of the prior mean on the posterior distribution.
Advantages compared to competing methods, such as Gaussian process regression
or Tikhonov regularization, are ease of estimation and model comparison. In
particular, we develop an EM algorithm with a simple E and M step for
estimating hyperparameters, facilitating estimation for complex models. We also
propose a novel parsimonious model formulation, requiring a single scale
parameter for each (possibly multidimensional) covariate and no further
parameters for interaction effects. This simplifies estimation because fewer
hyperparameters need to be estimated, and also simplifies model comparison of
models with the same covariates but different interaction effects; in this
case, the model with the highest estimated likelihood can be selected.
Using a number of widely analyzed real data sets we show that predictive
performance of our methodology is competitive. An R-package implementing the
methodology is available (Jamil, 2019)
Categorical marginal models: quite extensive package for the estimation of marginal models for categorical data
A package accompanying the book Marginal Models for Dependent, Clustered, and Longitudinal Categorical Data by Bergsma, Croon, & Hagenaars, 2009. Itβs purpose is fitting and testing of marginal models
Testing conditional independence for continuous random variables
Abstract: A common statistical problem is the testing of independence of two (response) variables conditionally on a third (control) variable. In the first part of this paper, we extend Hoeffding's concept of estimability of degree r to testability of degree r, and show that independence is testable of degree two, while conditional independence is not testable of any degree if the control variable is continuous. Hence, in a well-defined sense, conditional independence is much harder to test than independence. In the second part of the paper, a new method is introduced for the nonparametric testing of conditional independence of continuous responses given an arbitrary, not necessarily continuous, control variable. The method allows the automatic conversion of any test of independence to a test of conditional independence. Hence, robust tests and tests with power against broad ranges of alternatives can be used, which are favorable properties not shared by the most commonly used test, namely the one based on the partial correlation coefficient. The method is based on a new concept, the partial copula, which is an average of the conditional copulas. The feasibility of the approach is demonstrated by an example with medical data
A study of the power and robustness of a new test for independence against contiguous alternatives
Various association measures have been proposed in the literature that equal zero when the associated random variables are independent. However many measures, (e.g., Kendall's tau), may equal zero even in the presence of an association between the random variables. In order to over- come this drawback, Bergsma and Dassios (2014) proposed a modification of Kendall's tau, (denoted as Ο β), which is non-negative and zero if and only if independence holds. In this article, we investigate the robustness properties and the asymptotic distributions of Ο β and some other well-known measures of association under null and contiguous alternatives. Based on these asymptotic distributions under contiguous alternatives, we study the asymptotic power of the test based on Ο β under contiguous alternatives and compare its performance with the performance of other well-known tests available in the literature
- β¦