1,907 research outputs found
Causal Inference by Stochastic Complexity
The algorithmic Markov condition states that the most likely causal direction
between two random variables X and Y can be identified as that direction with
the lowest Kolmogorov complexity. Due to the halting problem, however, this
notion is not computable.
We hence propose to do causal inference by stochastic complexity. That is, we
propose to approximate Kolmogorov complexity via the Minimum Description Length
(MDL) principle, using a score that is mini-max optimal with regard to the
model class under consideration. This means that even in an adversarial
setting, such as when the true distribution is not in this class, we still
obtain the optimal encoding for the data relative to the class.
We instantiate this framework, which we call CISC, for pairs of univariate
discrete variables, using the class of multinomial distributions. Experiments
show that CISC is highly accurate on synthetic, benchmark, as well as
real-world data, outperforming the state of the art by a margin, and scales
extremely well with regard to sample and domain sizes
Determining Principal Component Cardinality through the Principle of Minimum Description Length
PCA (Principal Component Analysis) and its variants areubiquitous techniques
for matrix dimension reduction and reduced-dimensionlatent-factor extraction.
One significant challenge in using PCA, is thechoice of the number of principal
components. The information-theoreticMDL (Minimum Description Length) principle
gives objective compression-based criteria for model selection, but it is
difficult to analytically applyits modern definition - NML (Normalized Maximum
Likelihood) - to theproblem of PCA. This work shows a general reduction of NML
prob-lems to lower-dimension problems. Applying this reduction, it boundsthe
NML of PCA, by terms of the NML of linear regression, which areknown.Comment: LOD 201
Results on the Redundancy of Universal Compression for Finite-Length Sequences
In this paper, we investigate the redundancy of universal coding schemes on
smooth parametric sources in the finite-length regime. We derive an upper bound
on the probability of the event that a sequence of length , chosen using
Jeffreys' prior from the family of parametric sources with unknown
parameters, is compressed with a redundancy smaller than
for any . Our results also confirm
that for large enough and , the average minimax redundancy provides a
good estimate for the redundancy of most sources. Our result may be used to
evaluate the performance of universal source coding schemes on finite-length
sequences. Additionally, we precisely characterize the minimax redundancy for
two--stage codes. We demonstrate that the two--stage assumption incurs a
negligible redundancy especially when the number of source parameters is large.
Finally, we show that the redundancy is significant in the compression of small
sequences.Comment: accepted in the 2011 IEEE International Symposium on Information
Theory (ISIT 2011
Constraining Implicit Space with Minimum Description Length: An Unsupervised Attention Mechanism across Neural Network Layers
Inspired by the adaptation phenomenon of neuronal firing, we propose the
regularity normalization (RN) as an unsupervised attention mechanism (UAM)
which computes the statistical regularity in the implicit space of neural
networks under the Minimum Description Length (MDL) principle. Treating the
neural network optimization process as a partially observable model selection
problem, UAM constrains the implicit space by a normalization factor, the
universal code length. We compute this universal code incrementally across
neural network layers and demonstrated the flexibility to include data priors
such as top-down attention and other oracle information. Empirically, our
approach outperforms existing normalization methods in tackling limited,
imbalanced and non-stationary input distribution in image classification,
classic control, procedurally-generated reinforcement learning, generative
modeling, handwriting generation and question answering tasks with various
neural network architectures. Lastly, UAM tracks dependency and critical
learning stages across layers and recurrent time steps of deep networks
MDL Denoising Revisited
We refine and extend an earlier MDL denoising criterion for wavelet-based
denoising. We start by showing that the denoising problem can be reformulated
as a clustering problem, where the goal is to obtain separate clusters for
informative and non-informative wavelet coefficients, respectively. This
suggests two refinements, adding a code-length for the model index, and
extending the model in order to account for subband-dependent coefficient
distributions. A third refinement is derivation of soft thresholding inspired
by predictive universal coding with weighted mixtures. We propose a practical
method incorporating all three refinements, which is shown to achieve good
performance and robustness in denoising both artificial and natural signals.Comment: Submitted to IEEE Transactions on Information Theory, June 200
Causal Inference by Stochastic Complexity
The algorithmic Markov condition states that the most likely causal direction between two random variables X and Y can be identified as that direction with the lowest Kolmogorov complexity. Due to the halting problem, however, this notion is not computable. We hence propose to do causal inference by stochastic complexity. That is, we propose to approximate Kolmogorov complexity via the Minimum Description Length (MDL) principle, using a score that is mini-max optimal with regard to the model class under consideration. This means that even in an adversarial setting, such as when the true distribution is not in this class, we still obtain the optimal encoding for the data relative to the class. We instantiate this framework, which we call CISC, for pairs of univariate discrete variables, using the class of multinomial distributions. Experiments show that CISC is highly accurate on synthetic, benchmark, as well as real-world data, outperforming the state of the art by a margin, and scales extremely well with regard to sample and domain sizes
Minimum Description Length Revisited
This is an up-to-date introduction to and overview of the Minimum Description Length (MDL) Principle, a theory of inductive inference that can be applied to general problems in statistics, machine learning and pattern recognition. While MDL was originally based on data compression ideas, this introduction can be read without any knowledge thereof. It takes into account all major developments since 2007, the last time an extensive overview was written. These include new methods for model selection and averaging and hypothesis testing, as well as the first completely general definition of {\em MDL estimators}. Incorporating these developments, MDL can be seen as a powerful extension of both penalized likelihood and Bayesian approaches, in which penalization functions and prior distributions are replaced by more general luckiness functions, average-case methodology is replaced by a more robust worst-case approach, and in which methods classically viewed as highly distinct, such as AIC vs BIC and cross-validation vs Bayes can, to a large extent, be viewed from a unified perspective
- …