13,336 research outputs found
Relative Information Loss in the PCA
In this work we analyze principle component analysis (PCA) as a deterministic
input-output system. We show that the relative information loss induced by
reducing the dimensionality of the data after performing the PCA is the same as
in dimensionality reduction without PCA. Finally, we analyze the case where the
PCA uses the sample covariance matrix to compute the rotation. If the rotation
matrix is not available at the output, we show that an infinite amount of
information is lost. The relative information loss is shown to decrease with
increasing sample size.Comment: 9 pages, 4 figure; extended version of a paper accepted for
publicatio
Greedy Algorithms for Optimal Distribution Approximation
The approximation of a discrete probability distribution by an
-type distribution is considered. The approximation error is
measured by the informational divergence
, which is an appropriate measure, e.g.,
in the context of data compression. Properties of the optimal approximation are
derived and bounds on the approximation error are presented, which are
asymptotically tight. It is shown that -type approximations that minimize
either , or
, or the variational distance
can all be found by using specific
instances of the same general greedy algorithm.Comment: 5 page
Information-Preserving Markov Aggregation
We present a sufficient condition for a non-injective function of a Markov
chain to be a second-order Markov chain with the same entropy rate as the
original chain. This permits an information-preserving state space reduction by
merging states or, equivalently, lossless compression of a Markov source on a
sample-by-sample basis. The cardinality of the reduced state space is bounded
from below by the node degrees of the transition graph associated with the
original Markov chain.
We also present an algorithm listing all possible information-preserving
state space reductions, for a given transition graph. We illustrate our results
by applying the algorithm to a bi-gram letter model of an English text.Comment: 7 pages, 3 figures, 2 table
- …