5 research outputs found

    Robust learning from incomplete data via parameterized t mixture models through eigenvalue decomposition

    No full text
    Celeux and Govaert (1995, Pattern Recognition, 28, pp. 781-793)提出一個高斯混合模型(GMIX)的家族,將其組內共變異矩陣透過幾何的解釋方式使模型結構精簡化,其精簡過程的概念原創於Banfield and Raftery (1993, Biometrics, 49, pp. 803-821)。本文延伸其想法,針對可能含有遺失值且異質的多維資料建立非監督式學習,我們提出一個創新的多變量t 混合模型(TMIX) 的家族,此模型共有14 種精簡共變異矩陣的建模方法。在隨機遺失訊息機制(MAR)下,我們發展可行的EM-type 演算法來估計模型參數。為了計算上的便利與理論的發展,在估計過程中,我們引入兩個輔助性的指標矩陣來正確地選取每筆觀察項目可觀察到與遺失成份的位置。最後,我們藉由實例分析及不同遺失比例下之模擬研究來闡述所建議方法的實用性。Celeux and Govaert (1995, Pattern Recognition, 28, pp. 781-793) introduced a family of Gaussian mixture (GMIX) models in which the within-group covariance matrices are structured parsimoniously in a geometrically interpretable way as originally introduced by Banfield and Raftery (1993, Biometrics, 49, pp. 803-821). In this thesis, we extend their ideas to present a novel class of multivariate t mixture (TMIX) models with fourteen parsimonious covariance structures for the unsupervised learning of heterogeneous multivariate data with possible missing values. We establish computationally flexible EM-type algorithms for parameter estimation of these models under a missing at random (MAR) mechanism. For ease of computation and theoretical developments, two auxiliary indicator matrices are incorporated into the estimating procedure for exactly extracting the location of observed and missing components of each observation. The practical usefulness of the proposed methodology is illustrated with real examples and simulation studies with varying proportions of missing values.1. Introduction . . .1 2. Preliminaries . . .4 2.1. The TMIX model . . . 4 2.2. Fourteen parsimonious parameterizations of Σi . . . 5 2.3. The F-G diagonalization algorithm . . . 6 3. ML estimation for 14 parsimonious TMIX models with missing information . . .8 4. Some computational strategies . . .13 4.1. Specification of starting values . . . 13 4.2. Convergence assessment . . . 14 4.3. Model selection . . . 14 5. Simulations . . .16 5.1. Simulation 1 . . . 16 5.2. Simulation 2 . . . 17 6. Applications . . .21 6.1. Wine recognition data . . . 21 6.2. Pima Indians diabetes data . . . 24 7. Concluding remarks . . .2
    corecore