21,548 research outputs found
Minimum Conditional Description Length Estimation for Markov Random Fields
In this paper we discuss a method, which we call Minimum Conditional
Description Length (MCDL), for estimating the parameters of a subset of sites
within a Markov random field. We assume that the edges are known for the entire
graph . Then, for a subset , we estimate the parameters
for nodes and edges in as well as for edges incident to a node in , by
finding the exponential parameter for that subset that yields the best
compression conditioned on the values on the boundary . Our
estimate is derived from a temporally stationary sequence of observations on
the set . We discuss how this method can also be applied to estimate a
spatially invariant parameter from a single configuration, and in so doing,
derive the Maximum Pseudo-Likelihood (MPL) estimate.Comment: Information Theory and Applications (ITA) workshop, February 201
Neighborhood radius estimation in Variable-neighborhood Random Fields
We consider random fields defined by finite-region conditional probabilities
depending on a neighborhood of the region which changes with the boundary
conditions. To predict the symbols within any finite region it is necessary to
inspect a random number of neighborhood symbols which might change according to
the value of them. In analogy to the one dimensional setting we call these
neighborhood symbols the context of the region. This framework is a natural
extension, to d-dimensional fields, of the notion of variable-length Markov
chains introduced by Rissanen (1983) in his classical paper. We define an
algorithm to estimate the radius of the smallest ball containing the context
based on a realization of the field. We prove the consistency of this
estimator. Our proofs are constructive and yield explicit upper bounds for the
probability of wrong estimation of the radius of the context
Divergence rates of Markov order estimators and their application to statistical estimation of stationary ergodic processes
Stationary ergodic processes with finite alphabets are estimated by finite
memory processes from a sample, an n-length realization of the process, where
the memory depth of the estimator process is also estimated from the sample
using penalized maximum likelihood (PML). Under some assumptions on the
continuity rate and the assumption of non-nullness, a rate of convergence in
-distance is obtained, with explicit constants. The result requires an
analysis of the divergence of PML Markov order estimators for not necessarily
finite memory processes. This divergence problem is investigated in more
generality for three information criteria: the Bayesian information criterion
with generalized penalty term yielding the PML, and the normalized maximum
likelihood and the Krichevsky-Trofimov code lengths. Lower and upper bounds on
the estimated order are obtained. The notion of consistent Markov order
estimation is generalized for infinite memory processes using the concept of
oracle order estimates, and generalized consistency of the PML Markov order
estimator is presented.Comment: Published in at http://dx.doi.org/10.3150/12-BEJ468 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Bayesian Networks for Max-linear Models
We study Bayesian networks based on max-linear structural equations as
introduced in Gissibl and Kl\"uppelberg [16] and provide a summary of their
independence properties. In particular we emphasize that distributions for such
networks are generally not faithful to the independence model determined by
their associated directed acyclic graph. In addition, we consider some of the
basic issues of estimation and discuss generalized maximum likelihood
estimation of the coefficients, using the concept of a generalized likelihood
ratio for non-dominated families as introduced by Kiefer and Wolfowitz [21].
Finally we argue that the structure of a minimal network asymptotically can be
identified completely from observational data.Comment: 18 page
Causal inference using the algorithmic Markov condition
Inferring the causal structure that links n observables is usually based upon
detecting statistical dependences and choosing simple graphs that make the
joint measure Markovian. Here we argue why causal inference is also possible
when only single observations are present.
We develop a theory how to generate causal graphs explaining similarities
between single objects. To this end, we replace the notion of conditional
stochastic independence in the causal Markov condition with the vanishing of
conditional algorithmic mutual information and describe the corresponding
causal inference rules.
We explain why a consistent reformulation of causal inference in terms of
algorithmic complexity implies a new inference principle that takes into
account also the complexity of conditional probability densities, making it
possible to select among Markov equivalent causal graphs. This insight provides
a theoretical foundation of a heuristic principle proposed in earlier work.
We also discuss how to replace Kolmogorov complexity with decidable
complexity criteria. This can be seen as an algorithmic analog of replacing the
empirically undecidable question of statistical independence with practical
independence tests that are based on implicit or explicit assumptions on the
underlying distribution.Comment: 16 figure
- âŠ