3 research outputs found
From the EM Algorithm to the CM-EM Algorithm for Global Convergence of Mixture Models
The Expectation-Maximization (EM) algorithm for mixture models often results
in slow or invalid convergence. The popular convergence proof affirms that the
likelihood increases with Q; Q is increasing in the M -step and non-decreasing
in the E-step. The author found that (1) Q may and should decrease in some
E-steps; (2) The Shannon channel from the E-step is improper and hence the
expectation is improper. The author proposed the CM-EM algorithm (CM means
Channel's Matching), which adds a step to optimize the mixture ratios for the
proper Shannon channel and maximizes G, average log-normalized-likelihood, in
the M-step. Neal and Hinton's Maximization-Maximization (MM) algorithm use F
instead of Q to speed the convergence. Maximizing G is similar to maximizing F.
The new convergence proof is similar to Beal's proof with the variational
method. It first proves that the minimum relative entropy equals the minimum
R-G (R is mutual information), then uses variational and iterative methods that
Shannon et al. use for rate-distortion functions to prove the global
convergence. Some examples show that Q and F should and may decrease in some
E-steps. For the same example, the EM, MM, and CM-EM algorithms need about 36,
18, and 9 iterations respectively.Comment: 17 pages, 5 figure
The Semantic Information Method for Maximum Mutual Information and Maximum Likelihood of Tests, Estimations, and Mixture Models
It is very difficult to solve the Maximum Mutual Information (MMI) or Maximum
Likelihood (ML) for all possible Shannon Channels or uncertain rules of
choosing hypotheses, so that we have to use iterative methods. According to the
Semantic Mutual Information (SMI) and R(G) function proposed by Chenguang Lu
(1993) (where R(G) is an extension of information rate distortion function
R(D), and G is the lower limit of the SMI), we can obtain a new iterative
algorithm of solving the MMI and ML for tests, estimations, and mixture models.
The SMI is defined by the average log normalized likelihood. The likelihood
function is produced from the truth function and the prior by semantic Bayesian
inference. A group of truth functions constitute a semantic channel. Letting
the semantic channel and Shannon channel mutually match and iterate, we can
obtain the Shannon channel that maximizes the Shannon mutual information and
the average log likelihood. This iterative algorithm is called Channels'
Matching algorithm or the CM algorithm. The convergence can be intuitively
explained and proved by the R(G) function. Several iterative examples for
tests, estimations, and mixture models show that the computation of the CM
algorithm is simple (which can be demonstrated in excel files). For most random
examples, the numbers of iterations for convergence are close to 5. For mixture
models, the CM algorithm is similar to the EM algorithm; however, the CM
algorithm has better convergence and more potential applications in comparison
with the standard EM algorithm.Comment: 21 pages, 10 figure
Fair Marriage Principle and Initialization Map for the EM Algorithm
The popular convergence theory of the EM algorithm explains that the observed
incomplete data log-likelihood L and the complete data log-likelihood Q are
positively correlated, and we can maximize L by maximizing Q. The Deterministic
Annealing EM (DAEM) algorithm was hence proposed for avoiding locally maximal
Q. This paper provides different conclusions: 1) The popular convergence theory
is wrong; 2) The locally maximal Q can affect the convergent speed, but cannot
block the global convergence; 3) Like marriage competition, unfair competition
between two components may vastly decrease the globally convergent speed; 4)
Local convergence exists because the sample is too small, and unfair
competition exists; 5) An improved EM algorithm, called the Channel Matching
(CM) EM algorithm, can accelerate the global convergence. This paper provides
an initialization map with two means as two axes for the example of a binary
Gaussian mixture studied by the authors of DAEM algorithm. This map can tell
how fast the convergent speeds are for different initial means and why points
in some areas are not suitable as initial points. A two-dimensional example
indicates that the big sample or the fair initialization can avoid global
convergence. For more complicated mixture models, we need further study to
convert the fair marriage principle to specific methods for the
initializations.Comment: 13 pages and 9 figure