58,264 research outputs found

    A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table

    One-step preparation of cluster states in quantum dot molecules

    Full text link
    Cluster states, a special type of highly entangled states, are a universal resource for measurement-based quantum computation. Here, we propose an efficient one-step generation scheme for cluster states in semiconductor quantum dot molecules, where qubits are encoded on singlet and triplet state of two coupled quantum dots. By applying a collective electrical field or simultaneously adjusting interdot bias voltages of all double-dot molecule, we get a switchable Ising-like interaction between any two adjacent quantum molecule qubits. The initialization, the single qubit measurement, and the experimental parameters are discussed, which shows the large cluster state preparation and one-way quantum computation implementable in semiconductor quantum dots with the present techniques.Comment: 5 pages, 3 figure

    Quantum computing with antiferromagnetic spin clusters

    Full text link
    We show that a wide range of spin clusters with antiferromagnetic intracluster exchange interaction allows one to define a qubit. For these spin cluster qubits, initialization, quantum gate operation, and readout are possible using the same techniques as for single spins. Quantum gate operation for the spin cluster qubit does not require control over the intracluster exchange interaction. Electric and magnetic fields necessary to effect quantum gates need only be controlled on the length scale of the spin cluster rather than the scale for a single spin. Here, we calculate the energy gap separating the logical qubit states from the next excited state and the matrix elements which determine quantum gate operation times. We discuss spin cluster qubits formed by one- and two-dimensional arrays of s=1/2 spins as well as clusters formed by spins s>1/2. We illustrate the advantages of spin cluster qubits for various suggested implementations of spin qubits and analyze the scaling of decoherence time with spin cluster size.Comment: 15 pages, 7 figures; minor change

    Modelling and Verification of a Cluster-tree Formation Protocol Implementation for the IEEE 802.15.4 TSCH MAC Operation Mode

    Get PDF
    Correct and efficient initialization of wireless sensor networks can be challenging in the face of many uncertainties present in ad hoc wireless networks. In this paper we examine an implementation for the formation of a cluster-tree topology in a network which operates on top of the TSCH MAC operation mode of the IEEE 802.15.4 standard, and investigate it using formal methods. We show how both the mCRL2 language and toolset help us in identifying scenarios where the implementation does not form a proper topology. More importantly, our analysis leads to the conclusion that the cluster-tree formation algorithm has a super linear time complexity. So, it does not scale to large networks.Comment: In Proceedings MARS 2017, arXiv:1703.0581

    Robust EM algorithm for model-based curve clustering

    Full text link
    Model-based clustering approaches concern the paradigm of exploratory data analysis relying on the finite mixture model to automatically find a latent structure governing observed data. They are one of the most popular and successful approaches in cluster analysis. The mixture density estimation is generally performed by maximizing the observed-data log-likelihood by using the expectation-maximization (EM) algorithm. However, it is well-known that the EM algorithm initialization is crucial. In addition, the standard EM algorithm requires the number of clusters to be known a priori. Some solutions have been provided in [31, 12] for model-based clustering with Gaussian mixture models for multivariate data. In this paper we focus on model-based curve clustering approaches, when the data are curves rather than vectorial data, based on regression mixtures. We propose a new robust EM algorithm for clustering curves. We extend the model-based clustering approach presented in [31] for Gaussian mixture models, to the case of curve clustering by regression mixtures, including polynomial regression mixtures as well as spline or B-spline regressions mixtures. Our approach both handles the problem of initialization and the one of choosing the optimal number of clusters as the EM learning proceeds, rather than in a two-fold scheme. This is achieved by optimizing a penalized log-likelihood criterion. A simulation study confirms the potential benefit of the proposed algorithm in terms of robustness regarding initialization and funding the actual number of clusters.Comment: In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), 2013, Dallas, TX, US

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    An Arithmetic-Based Deterministic Centroid Initialization Method for the k-Means Clustering Algorithm

    Get PDF
    One of the greatest challenges in k-means clustering is positioning the initial cluster centers, or centroids, as close to optimal as possible, and doing so in an amount of time deemed reasonable. Traditional fc-means utilizes a randomization process for initializing these centroids, and poor initialization can lead to increased numbers of required clustering iterations to reach convergence, and a greater overall runtime. This research proposes a simple, arithmetic-based deterministic centroid initialization method which is much faster than randomized initialization. Preliminary experiments suggest that this collection of methods, referred to herein as the sharding centroid initialization algorithm family, often outperforms random initialization in terms of the required number of iterations for convergence and overall time-related metrics and is competitive or better in terms of the reported mean sum of squared errors (SSE) metric. Surprisingly, the sharding algorithms often manage to report more advantageous mean SSE values in the instances where their performance is slower than random initialization
    corecore