6,210 research outputs found

    Fast Approximate KK-Means via Cluster Closures

    Full text link
    KK-means, a simple and effective clustering algorithm, is one of the most widely used algorithms in multimedia and computer vision community. Traditional kk-means is an iterative algorithm---in each iteration new cluster centers are computed and each data point is re-assigned to its nearest center. The cluster re-assignment step becomes prohibitively expensive when the number of data points and cluster centers are large. In this paper, we propose a novel approximate kk-means algorithm to greatly reduce the computational complexity in the assignment step. Our approach is motivated by the observation that most active points changing their cluster assignments at each iteration are located on or near cluster boundaries. The idea is to efficiently identify those active points by pre-assembling the data into groups of neighboring points using multiple random spatial partition trees, and to use the neighborhood information to construct a closure for each cluster, in such a way only a small number of cluster candidates need to be considered when assigning a data point to its nearest cluster. Using complexity analysis, image data clustering, and applications to image retrieval, we show that our approach out-performs state-of-the-art approximate kk-means algorithms in terms of clustering quality and efficiency

    Moment Closure - A Brief Review

    Full text link
    Moment closure methods appear in myriad scientific disciplines in the modelling of complex systems. The goal is to achieve a closed form of a large, usually even infinite, set of coupled differential (or difference) equations. Each equation describes the evolution of one "moment", a suitable coarse-grained quantity computable from the full state space. If the system is too large for analytical and/or numerical methods, then one aims to reduce it by finding a moment closure relation expressing "higher-order moments" in terms of "lower-order moments". In this brief review, we focus on highlighting how moment closure methods occur in different contexts. We also conjecture via a geometric explanation why it has been difficult to rigorously justify many moment closure approximations although they work very well in practice.Comment: short survey paper (max 20 pages) for a broad audience in mathematics, physics, chemistry and quantitative biolog

    Fast k-means based on KNN Graph

    Full text link
    In the era of big data, k-means clustering has been widely adopted as a basic processing tool in various contexts. However, its computational cost could be prohibitively high as the data size and the cluster number are large. It is well known that the processing bottleneck of k-means lies in the operation of seeking closest centroid in each iteration. In this paper, a novel solution towards the scalability issue of k-means is presented. In the proposal, k-means is supported by an approximate k-nearest neighbors graph. In the k-means iteration, each data sample is only compared to clusters that its nearest neighbors reside. Since the number of nearest neighbors we consider is much less than k, the processing cost in this step becomes minor and irrelevant to k. The processing bottleneck is therefore overcome. The most interesting thing is that k-nearest neighbor graph is constructed by iteratively calling the fast kk-means itself. Comparing with existing fast k-means variants, the proposed algorithm achieves hundreds to thousands times speed-up while maintaining high clustering quality. As it is tested on 10 million 512-dimensional data, it takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the same scale of clustering, it would take 3 years for traditional k-means

    Why do ultrasoft repulsive particles cluster and crystallize? Analytical results from density functional theory

    Full text link
    We demonstrate the accuracy of the hypernetted chain closure and of the mean-field approximation for the calculation of the fluid-state properties of systems interacting by means of bounded and positive-definite pair potentials with oscillating Fourier transforms. Subsequently, we prove the validity of a bilinear, random-phase density functional for arbitrary inhomogeneous phases of the same systems. On the basis of this functional, we calculate analytically the freezing parameters of the latter. We demonstrate explicitly that the stable crystals feature a lattice constant that is independent of density and whose value is dictated by the position of the negative minimum of the Fourier transform of the pair potential. This property is equivalent with the existence of clusters, whose population scales proportionally to the density. We establish that regardless of the form of the interaction potential and of the location on the freezing line, all cluster crystals have a universal Lindemann ratio L = 0.189 at freezing. We further make an explicit link between the aforementioned density functional and the harmonic theory of crystals. This allows us to establish an equivalence between the emergence of clusters and the existence of negative Fourier components of the interaction potential. Finally, we make a connection between the class of models at hand and the system of infinite-dimensional hard spheres, when the limits of interaction steepness and space dimension are both taken to infinity in a particularly described fashion.Comment: 19 pages, 5 figures, submitted to J. Chem. Phys; new version: minor changes in structure of pape
    corecore