6,210 research outputs found
Fast Approximate -Means via Cluster Closures
-means, a simple and effective clustering algorithm, is one of the most
widely used algorithms in multimedia and computer vision community. Traditional
-means is an iterative algorithm---in each iteration new cluster centers are
computed and each data point is re-assigned to its nearest center. The cluster
re-assignment step becomes prohibitively expensive when the number of data
points and cluster centers are large.
In this paper, we propose a novel approximate -means algorithm to greatly
reduce the computational complexity in the assignment step. Our approach is
motivated by the observation that most active points changing their cluster
assignments at each iteration are located on or near cluster boundaries. The
idea is to efficiently identify those active points by pre-assembling the data
into groups of neighboring points using multiple random spatial partition
trees, and to use the neighborhood information to construct a closure for each
cluster, in such a way only a small number of cluster candidates need to be
considered when assigning a data point to its nearest cluster. Using complexity
analysis, image data clustering, and applications to image retrieval, we show
that our approach out-performs state-of-the-art approximate -means
algorithms in terms of clustering quality and efficiency
Moment Closure - A Brief Review
Moment closure methods appear in myriad scientific disciplines in the
modelling of complex systems. The goal is to achieve a closed form of a large,
usually even infinite, set of coupled differential (or difference) equations.
Each equation describes the evolution of one "moment", a suitable
coarse-grained quantity computable from the full state space. If the system is
too large for analytical and/or numerical methods, then one aims to reduce it
by finding a moment closure relation expressing "higher-order moments" in terms
of "lower-order moments". In this brief review, we focus on highlighting how
moment closure methods occur in different contexts. We also conjecture via a
geometric explanation why it has been difficult to rigorously justify many
moment closure approximations although they work very well in practice.Comment: short survey paper (max 20 pages) for a broad audience in
mathematics, physics, chemistry and quantitative biolog
Fast k-means based on KNN Graph
In the era of big data, k-means clustering has been widely adopted as a basic
processing tool in various contexts. However, its computational cost could be
prohibitively high as the data size and the cluster number are large. It is
well known that the processing bottleneck of k-means lies in the operation of
seeking closest centroid in each iteration. In this paper, a novel solution
towards the scalability issue of k-means is presented. In the proposal, k-means
is supported by an approximate k-nearest neighbors graph. In the k-means
iteration, each data sample is only compared to clusters that its nearest
neighbors reside. Since the number of nearest neighbors we consider is much
less than k, the processing cost in this step becomes minor and irrelevant to
k. The processing bottleneck is therefore overcome. The most interesting thing
is that k-nearest neighbor graph is constructed by iteratively calling the fast
-means itself. Comparing with existing fast k-means variants, the proposed
algorithm achieves hundreds to thousands times speed-up while maintaining high
clustering quality. As it is tested on 10 million 512-dimensional data, it
takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the
same scale of clustering, it would take 3 years for traditional k-means
Why do ultrasoft repulsive particles cluster and crystallize? Analytical results from density functional theory
We demonstrate the accuracy of the hypernetted chain closure and of the
mean-field approximation for the calculation of the fluid-state properties of
systems interacting by means of bounded and positive-definite pair potentials
with oscillating Fourier transforms. Subsequently, we prove the validity of a
bilinear, random-phase density functional for arbitrary inhomogeneous phases of
the same systems. On the basis of this functional, we calculate analytically
the freezing parameters of the latter. We demonstrate explicitly that the
stable crystals feature a lattice constant that is independent of density and
whose value is dictated by the position of the negative minimum of the Fourier
transform of the pair potential. This property is equivalent with the existence
of clusters, whose population scales proportionally to the density. We
establish that regardless of the form of the interaction potential and of the
location on the freezing line, all cluster crystals have a universal Lindemann
ratio L = 0.189 at freezing. We further make an explicit link between the
aforementioned density functional and the harmonic theory of crystals. This
allows us to establish an equivalence between the emergence of clusters and the
existence of negative Fourier components of the interaction potential. Finally,
we make a connection between the class of models at hand and the system of
infinite-dimensional hard spheres, when the limits of interaction steepness and
space dimension are both taken to infinity in a particularly described fashion.Comment: 19 pages, 5 figures, submitted to J. Chem. Phys; new version: minor
changes in structure of pape
- …