Search CORE

18,227 research outputs found

Mutual Information and Minimum Mean-square Error in Gaussian Channels

Author: Guo Dongning
Shamai Shlomo
Verdu Sergio
Publication venue
Publication date: 01/01/2004
Field of study

This paper deals with arbitrarily distributed finite-power input signals observed through an additive Gaussian noise channel. It shows a new formula that connects the input-output mutual information and the minimum mean-square error (MMSE) achievable by optimal estimation of the input given the output. That is, the derivative of the mutual information (nats) with respect to the signal-to-noise ratio (SNR) is equal to half the MMSE, regardless of the input statistics. This relationship holds for both scalar and vector signals, as well as for discrete-time and continuous-time noncausal MMSE estimation. This fundamental information-theoretic result has an unexpected consequence in continuous-time nonlinear estimation: For any input signal with finite power, the causal filtering MMSE achieved at SNR is equal to the average value of the noncausal smoothing MMSE achieved with a channel whose signal-to-noise ratio is chosen uniformly distributed between 0 and SNR

arXiv.org e-Print Archive

CiteSeerX

Distinguishing cause from effect using observational data: methods and benchmarks

Author: Janzing Dominik
Mooij Joris M.
Peters Jonas
Schölkopf Bernhard
Zscheischler Jakob
Publication venue
Publication date: 01/01/2015
Field of study

The discovery of causal relationships from purely observational data is a fundamental problem in science. The most elementary form of such a causal discovery problem is to decide whether X causes Y or, alternatively, Y causes X, given joint observations of two variables X, Y. An example is to decide whether altitude causes temperature, or vice versa, given only joint measurements of both variables. Even under the simplifying assumptions of no confounding, no feedback loops, and no selection bias, such bivariate causal discovery problems are challenging. Nevertheless, several approaches for addressing those problems have been proposed in recent years. We review two families of such methods: Additive Noise Methods (ANM) and Information Geometric Causal Inference (IGCI). We present the benchmark CauseEffectPairs that consists of data for 100 different cause-effect pairs selected from 37 datasets from various domains (e.g., meteorology, biology, medicine, engineering, economy, etc.) and motivate our decisions regarding the "ground truth" causal directions of all pairs. We evaluate the performance of several bivariate causal discovery methods on these real-world benchmark data and in addition on artificially simulated data. Our empirical results on real-world data indicate that certain methods are indeed able to distinguish cause from effect using only purely observational data, although more benchmark data would be needed to obtain statistically significant conclusions. One of the best performing methods overall is the additive-noise method originally proposed by Hoyer et al. (2009), which obtains an accuracy of 63+-10 % and an AUC of 0.74+-0.05 on the real-world benchmark. As the main theoretical contribution of this work we prove the consistency of that method.Comment: 101 pages, second revision submitted to Journal of Machine Learning Researc

arXiv.org e-Print Archive

UvA-DARE

Causal Inference on Discrete Data using Additive Noise Models

Author: Janzing Dominik
Peters Jonas
Schölkopf Bernhard
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/11/2009
Field of study

Inferring the causal structure of a set of random variables from a finite sample of the joint distribution is an important problem in science. Recently, methods using additive noise models have been suggested to approach the case of continuous variables. In many situations, however, the variables of interest are discrete or even have only finitely many states. In this work we extend the notion of additive noise models to these cases. We prove that whenever the joint distribution \prob^{(X,Y)} admits such a model in one direction, e.g. Y=f(X)+N, N \independent X, it does not admit the reversed model X=g(Y)+\tilde N, \tilde N \independent Y as long as the model is chosen in a generic way. Based on these deliberations we propose an efficient new algorithm that is able to distinguish between cause and effect for a finite sample of discrete variables. In an extensive experimental study we show that this algorithm works both on synthetic and real data sets

arXiv.org e-Print Archive

Causal Inference by Stochastic Complexity

Author: Budhathoki Kailash
Vreeken Jilles
Publication venue
Publication date: 01/01/2017
Field of study

The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes

arXiv.org e-Print Archive

Towards a Learning Theory of Cause-Effect Inference

Author: Lopez-Paz David
Muandet Krikamol
Schölkopf Bernhard
Tolstikhin Ilya
Publication venue
Publication date: 18/05/2015
Field of study

We pose causal inference as the problem of learning to classify probability distributions. In particular, we assume access to a collection

\{(S_i,l_i)\}_{i=1}^n

, where each

S_i

is a sample drawn from the probability distribution of

X_i \times Y_i

, and

l_i

is a binary label indicating whether "

X_i \to Y_i

" or "

X_i \leftarrow Y_i

". Given these data, we build a causal inference rule in two steps. First, we featurize each

S_i

using the kernel mean embedding associated with some characteristic kernel. Second, we train a binary classifier on such embeddings to distinguish between causal directions. We present generalization bounds showing the statistical consistency and learning rates of the proposed approach, and provide a simple implementation that achieves state-of-the-art cause-effect inference. Furthermore, we extend our ideas to infer causal relationships between more than two variables

arXiv.org e-Print Archive