3 research outputs found

    From the EM Algorithm to the CM-EM Algorithm for Global Convergence of Mixture Models

    Full text link
    The Expectation-Maximization (EM) algorithm for mixture models often results in slow or invalid convergence. The popular convergence proof affirms that the likelihood increases with Q; Q is increasing in the M -step and non-decreasing in the E-step. The author found that (1) Q may and should decrease in some E-steps; (2) The Shannon channel from the E-step is improper and hence the expectation is improper. The author proposed the CM-EM algorithm (CM means Channel's Matching), which adds a step to optimize the mixture ratios for the proper Shannon channel and maximizes G, average log-normalized-likelihood, in the M-step. Neal and Hinton's Maximization-Maximization (MM) algorithm use F instead of Q to speed the convergence. Maximizing G is similar to maximizing F. The new convergence proof is similar to Beal's proof with the variational method. It first proves that the minimum relative entropy equals the minimum R-G (R is mutual information), then uses variational and iterative methods that Shannon et al. use for rate-distortion functions to prove the global convergence. Some examples show that Q and F should and may decrease in some E-steps. For the same example, the EM, MM, and CM-EM algorithms need about 36, 18, and 9 iterations respectively.Comment: 17 pages, 5 figure

    The Semantic Information Method for Maximum Mutual Information and Maximum Likelihood of Tests, Estimations, and Mixture Models

    Full text link
    It is very difficult to solve the Maximum Mutual Information (MMI) or Maximum Likelihood (ML) for all possible Shannon Channels or uncertain rules of choosing hypotheses, so that we have to use iterative methods. According to the Semantic Mutual Information (SMI) and R(G) function proposed by Chenguang Lu (1993) (where R(G) is an extension of information rate distortion function R(D), and G is the lower limit of the SMI), we can obtain a new iterative algorithm of solving the MMI and ML for tests, estimations, and mixture models. The SMI is defined by the average log normalized likelihood. The likelihood function is produced from the truth function and the prior by semantic Bayesian inference. A group of truth functions constitute a semantic channel. Letting the semantic channel and Shannon channel mutually match and iterate, we can obtain the Shannon channel that maximizes the Shannon mutual information and the average log likelihood. This iterative algorithm is called Channels' Matching algorithm or the CM algorithm. The convergence can be intuitively explained and proved by the R(G) function. Several iterative examples for tests, estimations, and mixture models show that the computation of the CM algorithm is simple (which can be demonstrated in excel files). For most random examples, the numbers of iterations for convergence are close to 5. For mixture models, the CM algorithm is similar to the EM algorithm; however, the CM algorithm has better convergence and more potential applications in comparison with the standard EM algorithm.Comment: 21 pages, 10 figure

    Fair Marriage Principle and Initialization Map for the EM Algorithm

    Full text link
    The popular convergence theory of the EM algorithm explains that the observed incomplete data log-likelihood L and the complete data log-likelihood Q are positively correlated, and we can maximize L by maximizing Q. The Deterministic Annealing EM (DAEM) algorithm was hence proposed for avoiding locally maximal Q. This paper provides different conclusions: 1) The popular convergence theory is wrong; 2) The locally maximal Q can affect the convergent speed, but cannot block the global convergence; 3) Like marriage competition, unfair competition between two components may vastly decrease the globally convergent speed; 4) Local convergence exists because the sample is too small, and unfair competition exists; 5) An improved EM algorithm, called the Channel Matching (CM) EM algorithm, can accelerate the global convergence. This paper provides an initialization map with two means as two axes for the example of a binary Gaussian mixture studied by the authors of DAEM algorithm. This map can tell how fast the convergent speeds are for different initial means and why points in some areas are not suitable as initial points. A two-dimensional example indicates that the big sample or the fair initialization can avoid global convergence. For more complicated mixture models, we need further study to convert the fair marriage principle to specific methods for the initializations.Comment: 13 pages and 9 figure
    corecore