Search CORE

1,166 research outputs found

Non-convex clustering using expectation maximization algorithm with rough set initialization

Author: Mitra Pabitra
Pal Sankar K.
Siddiqi Md Aleemuddin
Publication venue: 'Elsevier BV'
Publication date: 01/03/2003
Field of study

An integration of a minimal spanning tree (MST) based graph-theoretic technique and expectation maximization (EM) algorithm with rough set initialization is described for non-convex clustering. EM provides the statistical model of the data and handles the associated uncertainties. Rough set theory helps in faster convergence and avoidance of the local minima problem, thereby enhancing the performance of EM. MST helps in determining non-convex clusters. Since it is applied on Gaussians rather than the original data points, time required is very low. These features are demonstrated on real life datasets. Comparison with related methods is made in terms of a cluster quality measure and computation time

Automatic seed initialization for the expectation-maximization algorithm and its application in 3D medical imaging

Author: Ghita Ovidiu
Ilea Dana E.
Lynch Michael
Robinson Kevin
Whelan Paul F.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2007
Field of study

Statistical partitioning of images into meaningful areas is the goal of all region-based segmentation algorithms. The clustering or creation of these meaningful partitions can be achieved in number of ways but in most cases it is achieved through the minimization or maximization of some function of the image intensity properties. Commonly these optimization schemes are locally convergent, therefore initialization of the parameters of the function plays a very important role in the ﬁnal solution. In this paper we perform an automatically initialized expectation-maximization algorithm to partition the data in medical MRI images. We present analysis and illustrate results against manual initialization and apply the algorithm to some common medical image processing task

Irish Universities

DCU Online Research Access Service

Likelihood adjusted semidefinite programs for clustering heterogeneous data

Author: Chen Xiaohui
Yang Yun
Zhuang Yubo
Publication venue
Publication date: 29/09/2022
Field of study

Clustering is a widely deployed unsupervised learning tool. Model-based clustering is a flexible framework to tackle data heterogeneity when the clusters have different shapes. Likelihood-based inference for mixture distributions often involves non-convex and high-dimensional objective functions, imposing difficult computational and statistical challenges. The classic expectation-maximization (EM) algorithm is a computationally thrifty iterative method that maximizes a surrogate function minorizing the log-likelihood of observed data in each iteration, which however suffers from bad local maxima even in the special case of the standard Gaussian mixture model with common isotropic covariance matrices. On the other hand, recent studies reveal that the unique global solution of a semidefinite programming (SDP) relaxed

K

-means achieves the information-theoretically sharp threshold for perfectly recovering the cluster labels under the standard Gaussian mixture model. In this paper, we extend the SDP approach to a general setting by integrating cluster labels as model parameters and propose an iterative likelihood adjusted SDP (iLA-SDP) method that directly maximizes the \emph{exact} observed likelihood in the presence of data heterogeneity. By lifting the cluster assignment to group-specific membership matrices, iLA-SDP avoids centroids estimation -- a key feature that allows exact recovery under well-separateness of centroids without being trapped by their adversarial configurations. Thus iLA-SDP is less sensitive than EM to initialization and more stable on high-dimensional data. Our numeric experiments demonstrate that iLA-SDP can achieve lower mis-clustering errors over several widely used clustering methods including

K

-means, SDP and EM algorithms

arXiv.org e-Print Archive

Development of a R package to facilitate the learning of clustering techniques

Author: Ruiz Sabajanes Eduardo
Publication venue
Publication date: 01/01/2023
Field of study

This project explores the development of a tool, in the form of a R package, to ease the process of learning clustering techniques, how they work and what their pros and cons are. This tool should provide implementations for several different clustering techniques with explanations in order to allow the student to get familiar with the characteristics of each algorithm by testing them against several different datasets while deepening their understanding of them through the explanations. Additionally, these explanations should adapt to the input data, making the tool not only adept for self-regulated learning but for teaching too.Grado en Ingeniería Informátic

e_Buah - Biblioteca Digital de la Universidad de Alcalá

A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

Author: Al Hasan
Al-Daoud
Aloise
Aloise
Anderberg
Babu
Babu
Ball
Bei
Bergmann
Bottou
Breunig
Cao
Celebi
Chen
Chen
Daniel
Forgy
Friedman
Garcia
Garcia
Gonzalez
Hartigan
Hassan A. Kingravi
Hotelling
Huang
Huang
Hubert
Hyvärinen
Iman
Jain
Jain
Jancey
Kanungo
Katsavounidis
Kaufman
Lance
Likas
Linde
Lloyd
Lu
Luengo
M. Emre Celebi
Maitra
Mao
Matsumoto
Meilă
Milligan
Milligan
Norušis
Onoda
Ordonez
Pal
Patricio A. Vela
Pena
Redmond
Selim
Späth
Su
Tarsitano
Tou
Wu
Zhang
Publication venue: 'Elsevier BV'
Publication date: 10/09/2012
Field of study

K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table

arXiv.org e-Print Archive

Crossref

Model-Based Multiple 3D Object Recognition in Range Data

Author: Breitenreicher Dirk
Publication venue
Publication date: 01/01/2010
Field of study

Vision guided systems are relevant for many industrial application areas, including manufacturing, medicine, service robots etc. A task common to these applications consists of detecting and localizing known objects in cluttered scenes. This amounts to solve the "chicken and egg" problem consisting of data assignment and parameter estimation, that is to localize an object and to determine its pose. In this work, we consider computer vision techniques for the special scenario of industrial bin-picking applications where the goal is to accurately estimate the positions of multiple instances of arbitrary, known objects that are randomly assembled in a bin. Although a-priori knowledge of the objects simplifies the problem, model symmetries, mutual occlusion as well as noise, unstructured measurements and run-time constraints render the problem far from being trivial. A common strategy to cope with this problem is to apply a two-step approach that consists of rough initialization estimation for each objects' position followed by subsequent refinement steps. Established initialization procedures only take into account single objects, however. Hence, they cannot resolve contextual constraints caused by multiple object instances and thus yield poor estimates of the objects' pose in many settings. Inaccurate initial configurations, on the other hand, cause state-of-the-art refinement algorithms to be unable to identify the objects' pose, such that the entire two-step approach is likely to fail. In this thesis, we propose a novel approach for obtaining initial estimates of all object positions jointly. Additionally, we investigate a new local, individual refinement procedure that copes with the shortcomings of state-of-the-art approaches while yielding fast and accurate registration results as well as a large region of attraction. Both stages are designed using advanced numerical techniques such as large-scale convex programming and geometric optimization on the curved space of Euclidean transformations, respectively. They complement each other in that conflicting interpretations are resolved through non-local convex processing, followed by accurate non-convex local optimization based on sufficiently good initializations. Exhaustive numerical evaluation on artificial and real-world measurements experimentally confirms the proposed two-step approach and demonstrates the robustness to noise, unstructured measurements and occlusions as well as showing the potential to meet run-time constraints of real-world industrial applications

Heidelberger Dokumentenserver

Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

arXiv.org e-Print Archive

Crossref