58,264 research outputs found
A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm
K-means is undoubtedly the most widely used partitional clustering algorithm.
Unfortunately, due to its gradient descent nature, this algorithm is highly
sensitive to the initial placement of the cluster centers. Numerous
initialization methods have been proposed to address this problem. In this
paper, we first present an overview of these methods with an emphasis on their
computational efficiency. We then compare eight commonly used linear time
complexity initialization methods on a large and diverse collection of data
sets using various performance criteria. Finally, we analyze the experimental
results using non-parametric statistical tests and provide recommendations for
practitioners. We demonstrate that popular initialization methods often perform
poorly and that there are in fact strong alternatives to these methods.Comment: 17 pages, 1 figure, 7 table
One-step preparation of cluster states in quantum dot molecules
Cluster states, a special type of highly entangled states, are a universal
resource for measurement-based quantum computation. Here, we propose an
efficient one-step generation scheme for cluster states in semiconductor
quantum dot molecules, where qubits are encoded on singlet and triplet state of
two coupled quantum dots. By applying a collective electrical field or
simultaneously adjusting interdot bias voltages of all double-dot molecule, we
get a switchable Ising-like interaction between any two adjacent quantum
molecule qubits. The initialization, the single qubit measurement, and the
experimental parameters are discussed, which shows the large cluster state
preparation and one-way quantum computation implementable in semiconductor
quantum dots with the present techniques.Comment: 5 pages, 3 figure
Quantum computing with antiferromagnetic spin clusters
We show that a wide range of spin clusters with antiferromagnetic
intracluster exchange interaction allows one to define a qubit. For these spin
cluster qubits, initialization, quantum gate operation, and readout are
possible using the same techniques as for single spins. Quantum gate operation
for the spin cluster qubit does not require control over the intracluster
exchange interaction. Electric and magnetic fields necessary to effect quantum
gates need only be controlled on the length scale of the spin cluster rather
than the scale for a single spin. Here, we calculate the energy gap separating
the logical qubit states from the next excited state and the matrix elements
which determine quantum gate operation times. We discuss spin cluster qubits
formed by one- and two-dimensional arrays of s=1/2 spins as well as clusters
formed by spins s>1/2. We illustrate the advantages of spin cluster qubits for
various suggested implementations of spin qubits and analyze the scaling of
decoherence time with spin cluster size.Comment: 15 pages, 7 figures; minor change
Modelling and Verification of a Cluster-tree Formation Protocol Implementation for the IEEE 802.15.4 TSCH MAC Operation Mode
Correct and efficient initialization of wireless sensor networks can be
challenging in the face of many uncertainties present in ad hoc wireless
networks. In this paper we examine an implementation for the formation of a
cluster-tree topology in a network which operates on top of the TSCH MAC
operation mode of the IEEE 802.15.4 standard, and investigate it using formal
methods. We show how both the mCRL2 language and toolset help us in identifying
scenarios where the implementation does not form a proper topology. More
importantly, our analysis leads to the conclusion that the cluster-tree
formation algorithm has a super linear time complexity. So, it does not scale
to large networks.Comment: In Proceedings MARS 2017, arXiv:1703.0581
Robust EM algorithm for model-based curve clustering
Model-based clustering approaches concern the paradigm of exploratory data
analysis relying on the finite mixture model to automatically find a latent
structure governing observed data. They are one of the most popular and
successful approaches in cluster analysis. The mixture density estimation is
generally performed by maximizing the observed-data log-likelihood by using the
expectation-maximization (EM) algorithm. However, it is well-known that the EM
algorithm initialization is crucial. In addition, the standard EM algorithm
requires the number of clusters to be known a priori. Some solutions have been
provided in [31, 12] for model-based clustering with Gaussian mixture models
for multivariate data. In this paper we focus on model-based curve clustering
approaches, when the data are curves rather than vectorial data, based on
regression mixtures. We propose a new robust EM algorithm for clustering
curves. We extend the model-based clustering approach presented in [31] for
Gaussian mixture models, to the case of curve clustering by regression
mixtures, including polynomial regression mixtures as well as spline or
B-spline regressions mixtures. Our approach both handles the problem of
initialization and the one of choosing the optimal number of clusters as the EM
learning proceeds, rather than in a two-fold scheme. This is achieved by
optimizing a penalized log-likelihood criterion. A simulation study confirms
the potential benefit of the proposed algorithm in terms of robustness
regarding initialization and funding the actual number of clusters.Comment: In Proceedings of the 2013 International Joint Conference on Neural
Networks (IJCNN), 2013, Dallas, TX, US
Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm
Over the past five decades, k-means has become the clustering algorithm of
choice in many application domains primarily due to its simplicity, time/space
efficiency, and invariance to the ordering of the data points. Unfortunately,
the algorithm's sensitivity to the initial selection of the cluster centers
remains to be its most serious drawback. Numerous initialization methods have
been proposed to address this drawback. Many of these methods, however, have
time complexity superlinear in the number of data points, which makes them
impractical for large data sets. On the other hand, linear methods are often
random and/or sensitive to the ordering of the data points. These methods are
generally unreliable in that the quality of their results is unpredictable.
Therefore, it is common practice to perform multiple runs of such methods and
take the output of the run that produces the best results. Such a practice,
however, greatly increases the computational requirements of the otherwise
highly efficient k-means algorithm. In this chapter, we investigate the
empirical performance of six linear, deterministic (non-random), and
order-invariant k-means initialization methods on a large and diverse
collection of data sets from the UCI Machine Learning Repository. The results
demonstrate that two relatively unknown hierarchical initialization methods due
to Su and Dy outperform the remaining four methods with respect to two
objective effectiveness criteria. In addition, a recent method due to Erisoglu
et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms
(Springer, 2014). arXiv admin note: substantial text overlap with
arXiv:1304.7465, arXiv:1209.196
An Arithmetic-Based Deterministic Centroid Initialization Method for the k-Means Clustering Algorithm
One of the greatest challenges in k-means clustering is positioning the initial cluster centers, or centroids, as close to optimal as possible, and doing so in an amount of time deemed reasonable. Traditional fc-means utilizes a randomization process for initializing these centroids, and poor initialization can lead to increased numbers of required clustering iterations to reach convergence, and a greater overall runtime. This research proposes a simple, arithmetic-based deterministic centroid initialization method which is much faster than randomized initialization. Preliminary experiments suggest that this collection of methods, referred to herein as the sharding centroid initialization algorithm family, often outperforms random initialization in terms of the required number of iterations for convergence and overall time-related metrics and is competitive or better in terms of the reported mean sum of squared errors (SSE) metric. Surprisingly, the sharding algorithms often manage to report more advantageous mean SSE values in the instances where their performance is slower than random initialization
- …