1,166 research outputs found

    Non-convex clustering using expectation maximization algorithm with rough set initialization

    Get PDF
    An integration of a minimal spanning tree (MST) based graph-theoretic technique and expectation maximization (EM) algorithm with rough set initialization is described for non-convex clustering. EM provides the statistical model of the data and handles the associated uncertainties. Rough set theory helps in faster convergence and avoidance of the local minima problem, thereby enhancing the performance of EM. MST helps in determining non-convex clusters. Since it is applied on Gaussians rather than the original data points, time required is very low. These features are demonstrated on real life datasets. Comparison with related methods is made in terms of a cluster quality measure and computation time

    Automatic seed initialization for the expectation-maximization algorithm and its application in 3D medical imaging

    Get PDF
    Statistical partitioning of images into meaningful areas is the goal of all region-based segmentation algorithms. The clustering or creation of these meaningful partitions can be achieved in number of ways but in most cases it is achieved through the minimization or maximization of some function of the image intensity properties. Commonly these optimization schemes are locally convergent, therefore initialization of the parameters of the function plays a very important role in the final solution. In this paper we perform an automatically initialized expectation-maximization algorithm to partition the data in medical MRI images. We present analysis and illustrate results against manual initialization and apply the algorithm to some common medical image processing task

    Likelihood adjusted semidefinite programs for clustering heterogeneous data

    Full text link
    Clustering is a widely deployed unsupervised learning tool. Model-based clustering is a flexible framework to tackle data heterogeneity when the clusters have different shapes. Likelihood-based inference for mixture distributions often involves non-convex and high-dimensional objective functions, imposing difficult computational and statistical challenges. The classic expectation-maximization (EM) algorithm is a computationally thrifty iterative method that maximizes a surrogate function minorizing the log-likelihood of observed data in each iteration, which however suffers from bad local maxima even in the special case of the standard Gaussian mixture model with common isotropic covariance matrices. On the other hand, recent studies reveal that the unique global solution of a semidefinite programming (SDP) relaxed KK-means achieves the information-theoretically sharp threshold for perfectly recovering the cluster labels under the standard Gaussian mixture model. In this paper, we extend the SDP approach to a general setting by integrating cluster labels as model parameters and propose an iterative likelihood adjusted SDP (iLA-SDP) method that directly maximizes the \emph{exact} observed likelihood in the presence of data heterogeneity. By lifting the cluster assignment to group-specific membership matrices, iLA-SDP avoids centroids estimation -- a key feature that allows exact recovery under well-separateness of centroids without being trapped by their adversarial configurations. Thus iLA-SDP is less sensitive than EM to initialization and more stable on high-dimensional data. Our numeric experiments demonstrate that iLA-SDP can achieve lower mis-clustering errors over several widely used clustering methods including KK-means, SDP and EM algorithms

    Development of a R package to facilitate the learning of clustering techniques

    Get PDF
    This project explores the development of a tool, in the form of a R package, to ease the process of learning clustering techniques, how they work and what their pros and cons are. This tool should provide implementations for several different clustering techniques with explanations in order to allow the student to get familiar with the characteristics of each algorithm by testing them against several different datasets while deepening their understanding of them through the explanations. Additionally, these explanations should adapt to the input data, making the tool not only adept for self-regulated learning but for teaching too.Grado en Ingeniería Informátic

    A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table

    Model-Based Multiple 3D Object Recognition in Range Data

    Get PDF
    Vision guided systems are relevant for many industrial application areas, including manufacturing, medicine, service robots etc. A task common to these applications consists of detecting and localizing known objects in cluttered scenes. This amounts to solve the "chicken and egg" problem consisting of data assignment and parameter estimation, that is to localize an object and to determine its pose. In this work, we consider computer vision techniques for the special scenario of industrial bin-picking applications where the goal is to accurately estimate the positions of multiple instances of arbitrary, known objects that are randomly assembled in a bin. Although a-priori knowledge of the objects simplifies the problem, model symmetries, mutual occlusion as well as noise, unstructured measurements and run-time constraints render the problem far from being trivial. A common strategy to cope with this problem is to apply a two-step approach that consists of rough initialization estimation for each objects' position followed by subsequent refinement steps. Established initialization procedures only take into account single objects, however. Hence, they cannot resolve contextual constraints caused by multiple object instances and thus yield poor estimates of the objects' pose in many settings. Inaccurate initial configurations, on the other hand, cause state-of-the-art refinement algorithms to be unable to identify the objects' pose, such that the entire two-step approach is likely to fail. In this thesis, we propose a novel approach for obtaining initial estimates of all object positions jointly. Additionally, we investigate a new local, individual refinement procedure that copes with the shortcomings of state-of-the-art approaches while yielding fast and accurate registration results as well as a large region of attraction. Both stages are designed using advanced numerical techniques such as large-scale convex programming and geometric optimization on the curved space of Euclidean transformations, respectively. They complement each other in that conflicting interpretations are resolved through non-local convex processing, followed by accurate non-convex local optimization based on sufficiently good initializations. Exhaustive numerical evaluation on artificial and real-world measurements experimentally confirms the proposed two-step approach and demonstrates the robustness to noise, unstructured measurements and occlusions as well as showing the potential to meet run-time constraints of real-world industrial applications

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196
    corecore