7,181 research outputs found
We Are Not Your Real Parents: Telling Causal from Confounded using MDL
Given data over variables we consider the problem of finding out whether jointly causes or whether they are all confounded by an unobserved latent variable . To do so, we take an information-theoretic approach based on Kolmogorov complexity. In a nutshell, we follow the postulate that first encoding the true cause, and then the effects given that cause, results in a shorter description than any other encoding of the observed variables. The ideal score is not computable, and hence we have to approximate it. We propose to do so using the Minimum Description Length (MDL) principle. We compare the MDL scores under the models where causes and where there exists a latent variables confounding both and and show our scores are consistent. To find potential confounders we propose using latent factor modeling, in particular, probabilistic PCA (PPCA). Empirical evaluation on both synthetic and real-world data shows that our method, CoCa, performs very well -- even when the true generating process of the data is far from the assumptions made by the models we use. Moreover, it is robust as its accuracy goes hand in hand with its confidence
Editorial Comment on the Special Issue of "Information in Dynamical Systems and Complex Systems"
This special issue collects contributions from the participants of the
"Information in Dynamical Systems and Complex Systems" workshop, which cover a
wide range of important problems and new approaches that lie in the
intersection of information theory and dynamical systems. The contributions
include theoretical characterization and understanding of the different types
of information flow and causality in general stochastic processes, inference
and identification of coupling structure and parameters of system dynamics,
rigorous coarse-grain modeling of network dynamical systems, and exact
statistical testing of fundamental information-theoretic quantities such as the
mutual information. The collective efforts reported herein reflect a modern
perspective of the intimate connection between dynamical systems and
information flow, leading to the promise of better understanding and modeling
of natural complex systems and better/optimal design of engineering systems
Group invariance principles for causal generative models
The postulate of independence of cause and mechanism (ICM) has recently led
to several new causal discovery algorithms. The interpretation of independence
and the way it is utilized, however, varies across these methods. Our aim in
this paper is to propose a group theoretic framework for ICM to unify and
generalize these approaches. In our setting, the cause-mechanism relationship
is assessed by comparing it against a null hypothesis through the application
of random generic group transformations. We show that the group theoretic view
provides a very general tool to study the structure of data generating
mechanisms with direct applications to machine learning.Comment: 16 pages, 6 figure
Reverse Engineering Gene Networks with ANN: Variability in Network Inference Algorithms
Motivation :Reconstructing the topology of a gene regulatory network is one
of the key tasks in systems biology. Despite of the wide variety of proposed
methods, very little work has been dedicated to the assessment of their
stability properties. Here we present a methodical comparison of the
performance of a novel method (RegnANN) for gene network inference based on
multilayer perceptrons with three reference algorithms (ARACNE, CLR, KELLER),
focussing our analysis on the prediction variability induced by both the
network intrinsic structure and the available data.
Results: The extensive evaluation on both synthetic data and a selection of
gene modules of "Escherichia coli" indicates that all the algorithms suffer of
instability and variability issues with regards to the reconstruction of the
topology of the network. This instability makes objectively very hard the task
of establishing which method performs best. Nevertheless, RegnANN shows MCC
scores that compare very favorably with all the other inference methods tested.
Availability: The software for the RegnANN inference algorithm is distributed
under GPL3 and it is available at the corresponding author home page
(http://mpba.fbk.eu/grimaldi/regnann-supmat
Coherent frequentism
By representing the range of fair betting odds according to a pair of
confidence set estimators, dual probability measures on parameter space called
frequentist posteriors secure the coherence of subjective inference without any
prior distribution. The closure of the set of expected losses corresponding to
the dual frequentist posteriors constrains decisions without arbitrarily
forcing optimization under all circumstances. This decision theory reduces to
those that maximize expected utility when the pair of frequentist posteriors is
induced by an exact or approximate confidence set estimator or when an
automatic reduction rule is applied to the pair. In such cases, the resulting
frequentist posterior is coherent in the sense that, as a probability
distribution of the parameter of interest, it satisfies the axioms of the
decision-theoretic and logic-theoretic systems typically cited in support of
the Bayesian posterior. Unlike the p-value, the confidence level of an interval
hypothesis derived from such a measure is suitable as an estimator of the
indicator of hypothesis truth since it converges in sample-space probability to
1 if the hypothesis is true or to 0 otherwise under general conditions.Comment: The confidence-measure theory of inference and decision is explicitly
extended to vector parameters of interest. The derivation of upper and lower
confidence levels from valid and nonconservative set estimators is formalize
- …