22 research outputs found
Universal Estimation of Directed Information
Four estimators of the directed information rate between a pair of jointly
stationary ergodic finite-alphabet processes are proposed, based on universal
probability assignments. The first one is a Shannon--McMillan--Breiman type
estimator, similar to those used by Verd\'u (2005) and Cai, Kulkarni, and
Verd\'u (2006) for estimation of other information measures. We show the almost
sure and convergence properties of the estimator for any underlying
universal probability assignment. The other three estimators map universal
probability assignments to different functionals, each exhibiting relative
merits such as smoothness, nonnegativity, and boundedness. We establish the
consistency of these estimators in almost sure and senses, and derive
near-optimal rates of convergence in the minimax sense under mild conditions.
These estimators carry over directly to estimating other information measures
of stationary ergodic finite-alphabet processes, such as entropy rate and
mutual information rate, with near-optimal performance and provide alternatives
to classical approaches in the existing literature. Guided by these theoretical
results, the proposed estimators are implemented using the context-tree
weighting algorithm as the universal probability assignment. Experiments on
synthetic and real data are presented, demonstrating the potential of the
proposed schemes in practice and the utility of directed information estimation
in detecting and measuring causal influence and delay.Comment: 23 pages, 10 figures, to appear in IEEE Transactions on Information
Theor
Justification of Logarithmic Loss via the Benefit of Side Information
We consider a natural measure of relevance: the reduction in optimal
prediction risk in the presence of side information. For any given loss
function, this relevance measure captures the benefit of side information for
performing inference on a random variable under this loss function. When such a
measure satisfies a natural data processing property, and the random variable
of interest has alphabet size greater than two, we show that it is uniquely
characterized by the mutual information, and the corresponding loss function
coincides with logarithmic loss. In doing so, our work provides a new
characterization of mutual information, and justifies its use as a measure of
relevance. When the alphabet is binary, we characterize the only admissible
forms the measure of relevance can assume while obeying the specified data
processing property. Our results naturally extend to measuring causal influence
between stochastic processes, where we unify different causal-inference
measures in the literature as instantiations of directed information
Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes
We investigate approximating joint distributions of random processes with
causal dependence tree distributions. Such distributions are particularly
useful in providing parsimonious representation when there exists causal
dynamics among processes. By extending the results by Chow and Liu on
dependence tree approximations, we show that the best causal dependence tree
approximation is the one which maximizes the sum of directed informations on
its edges, where best is defined in terms of minimizing the KL-divergence
between the original and the approximate distribution. Moreover, we describe a
low-complexity algorithm to efficiently pick this approximate distribution.Comment: 9 pages, 15 figure