1,871 research outputs found
Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information
Conditional independence testing is a fundamental problem underlying causal
discovery and a particularly challenging task in the presence of nonlinear and
high-dimensional dependencies. Here a fully non-parametric test for continuous
data based on conditional mutual information combined with a local permutation
scheme is presented. Through a nearest neighbor approach, the test efficiently
adapts also to non-smooth distributions due to strongly nonlinear dependencies.
Numerical experiments demonstrate that the test reliably simulates the null
distribution even for small sample sizes and with high-dimensional conditioning
sets. The test is better calibrated than kernel-based tests utilizing an
analytical approximation of the null distribution, especially for non-smooth
densities, and reaches the same or higher power levels. Combining the local
permutation scheme with the kernel tests leads to better calibration, but
suffers in power. For smaller sample sizes and lower dimensions, the test is
faster than random fourier feature-based kernel tests if the permutation scheme
is (embarrassingly) parallelized, but the runtime increases more sharply with
sample size and dimensionality. Thus, more theoretical research to analytically
approximate the null distribution and speed up the estimation for larger sample
sizes is desirable.Comment: 17 pages, 12 figures, 1 tabl
Optimal model-free prediction from multivariate time series
Forecasting a time series from multivariate predictors constitutes a
challenging problem, especially using model-free approaches. Most techniques,
such as nearest-neighbor prediction, quickly suffer from the curse of
dimensionality and overfitting for more than a few predictors which has limited
their application mostly to the univariate case. Therefore, selection
strategies are needed that harness the available information as efficiently as
possible. Since often the right combination of predictors matters, ideally all
subsets of possible predictors should be tested for their predictive power, but
the exponentially growing number of combinations makes such an approach
computationally prohibitive. Here a prediction scheme that overcomes this
strong limitation is introduced utilizing a causal pre-selection step which
drastically reduces the number of possible predictors to the most predictive
set of causal drivers making a globally optimal search scheme tractable. The
information-theoretic optimality is derived and practical selection criteria
are discussed. As demonstrated for multivariate nonlinear stochastic delay
processes, the optimal scheme can even be less computationally expensive than
commonly used sub-optimal schemes like forward selection. The method suggests a
general framework to apply the optimal model-free approach to select variables
and subsequently fit a model to further improve a prediction or learn
statistical dependencies. The performance of this framework is illustrated on a
climatological index of El Ni\~no Southern Oscillation.Comment: 14 pages, 9 figure
Causal conditioning and instantaneous coupling in causality graphs
The paper investigates the link between Granger causality graphs recently
formalized by Eichler and directed information theory developed by Massey and
Kramer. We particularly insist on the implication of two notions of causality
that may occur in physical systems. It is well accepted that dynamical
causality is assessed by the conditional transfer entropy, a measure appearing
naturally as a part of directed information. Surprisingly the notion of
instantaneous causality is often overlooked, even if it was clearly understood
in early works. In the bivariate case, instantaneous coupling is measured
adequately by the instantaneous information exchange, a measure that
supplements the transfer entropy in the decomposition of directed information.
In this paper, the focus is put on the multivariate case and conditional graph
modeling issues. In this framework, we show that the decomposition of directed
information into the sum of transfer entropy and information exchange does not
hold anymore. Nevertheless, the discussion allows to put forward the two
measures as pillars for the inference of causality graphs. We illustrate this
on two synthetic examples which allow us to discuss not only the theoretical
concepts, but also the practical estimation issues.Comment: submitte
Optimal model-free prediction from multivariate time series
© 2015 American Physical Society.Forecasting a time series from multivariate predictors constitutes a challenging problem, especially using model-free approaches. Most techniques, such as nearest-neighbor prediction, quickly suffer from the curse of dimensionality and overfitting for more than a few predictors which has limited their application mostly to the univariate case. Therefore, selection strategies are needed that harness the available information as efficiently as possible. Since often the right combination of predictors matters, ideally all subsets of possible predictors should be tested for their predictive power, but the exponentially growing number of combinations makes such an approach computationally prohibitive. Here a prediction scheme that overcomes this strong limitation is introduced utilizing a causal preselection step which drastically reduces the number of possible predictors to the most predictive set of causal drivers making a globally optimal search scheme tractable. The information-theoretic optimality is derived and practical selection criteria are discussed. As demonstrated for multivariate nonlinear stochastic delay processes, the optimal scheme can even be less computationally expensive than commonly used suboptimal schemes like forward selection. The method suggests a general framework to apply the optimal model-free approach to select variables and subsequently fit a model to further improve a prediction or learn statistical dependencies. The performance of this framework is illustrated on a climatological index of El Niño Southern Oscillation
- …