Search CORE

25,982 research outputs found

Macrostate Data Clustering

Author: A. Pothen
A. Ulitsky
C.J. Alpert
D. Horn
D. Shalloway
Daniel Korenblum
David Shalloway
G. Milligan
K. Rose
L. Angelini
L. Giada
L. Kullmann
M. Blatt
M. Wong
O. Alter
R.B. Altman
S. Wiseman
S.T. Barnard
Publication venue: 'American Physical Society (APS)'
Publication date: 18/06/2003
Field of study

We develop an effective nonhierarchical data clustering method using an analogy to the dynamic coarse graining of a stochastic system. Analyzing the eigensystem of an interitem transition matrix identifies fuzzy clusters corresponding to the metastable macroscopic states (macrostates) of a diffusive system. A "minimum uncertainty criterion" determines the linear transformation from eigenvectors to cluster-defining window functions. Eigenspectrum gap and cluster certainty conditions identify the proper number of clusters. The physically motivated fuzzy representation and associated uncertainty analysis distinguishes macrostate clustering from spectral partitioning methods. Macrostate data clustering solves a variety of test cases that challenge other methods.Comment: keywords: cluster analysis, clustering, pattern recognition, spectral graph theory, dynamic eigenvectors, machine learning, macrostates, classificatio

arXiv.org e-Print Archive

Crossref

CERN Document Server

Stochastic Data Clustering

Author: Meyer Carl D.
Wessell Charles D.
Publication venue
Publication date: 01/01/2012
Field of study

In 1961 Herbert Simon and Albert Ando published the theory behind the long-term behavior of a dynamical system that can be described by a nearly uncoupled matrix. Over the past fifty years this theory has been used in a variety of contexts, including queueing theory, brain organization, and ecology. In all these applications, the structure of the system is known and the point of interest is the various stages the system passes through on its way to some long-term equilibrium. This paper looks at this problem from the other direction. That is, we develop a technique for using the evolution of the system to tell us about its initial structure, and we use this technique to develop a new algorithm for data clustering.Comment: 23 page

arXiv.org e-Print Archive

CiteSeerX

Gettysburg College

Recommended from our members

Seismic data clustering management system

Author: Banitsas K
Katsifarakis E
Konstantaras A
Maravelakis E
Skounakis E
Varley M
Publication venue: European Geosciences Union
Publication date: 01/01/2011
Field of study

This is the abstract of the paper given at the conference. Copyright @ 2011 The Authors.Over the last years, seismic images have increasingly played a vital role to the study of earthquakes. The large volume of seismic data that has been accumulated has created the need to develop sophisticated systems to manage this kind of data. Seismic interpretation can play a much more active role in the evaluation of large volumes of data by providing at an early stage vital information relating to the framework of potential producing levels. [1] This work presents a novel method to manage and analyse seismic data. The data is initially turned into clustering maps using clustering techniques [2] [3] [4] [5] [6], in order to be analysed on the platform. These clustering maps can then be analysed with the friendly-user interface of Seismic 1 which is based on .Net framework architecture [7]. This feature permits the porting of the application in any Windows – based computer as also to many other Linux based environments, using the Mono project functionality [8], so it can run an application using the No-Touch Deployment [7]. The platform supports two ways of processing seismic data. Firstly, a fast multifunctional version of the classical region-growing segmentation algorithm [9], [10] is applied to various areas of interest permitting their precise definition and labelling. Moreover, this algorithm is assigned to automatically allocate new earthquakes to a particular cluster based upon the magnitude of the centre of gravity of the existing clusters; or create a new cluster if all centers of gravity are above a predefined by the user upper threshold point. Secondly, a visual technique is used to record the behaviour of a cluster of earthquakes in a designated area. In this way, the system functions as a dynamic temporal simulator which depicts sequences of earthquakes on a map [11]

Brunel University Research Archive

Bipartite graph partitioning and data clustering

Author: Ding C.
Gu M.
He X.
Simon H.
Zha H.
Publication venue
Publication date: 01/01/2001
Field of study

Many data types arising from data mining applications can be modeled as bipartite graphs, examples include terms and documents in a text corpus, customers and purchasing items in market basket analysis and reviewers and movies in a movie recommender system. In this paper, we propose a new data clustering method based on partitioning the underlying bipartite graph. The partition is constructed by minimizing a normalized sum of edge weights between unmatched pairs of vertices of the bipartite graph. We show that an approximate solution to the minimization problem can be obtained by computing a partial singular value decomposition (SVD) of the associated edge weight matrix of the bipartite graph. We point out the connection of our clustering algorithm to correspondence analysis used in multivariate analysis. We also briefly discuss the issue of assigning data objects to multiple clusters. In the experimental results, we apply our clustering algorithm to the problem of document clustering to illustrate its effectiveness and efficiency.Comment: Proceedings of ACM CIKM 2001, the Tenth International Conference on Information and Knowledge Management, 200

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

UNT Digital Library

Cost functions for pairwise data clustering

Author: Alon
Anderson
Angelini
Bishop
Bishop
Blatt
Dekel
Dempster
Duda
Giada
Hofman
Kirkpatrick
Kullmann
L. Angelini
L. Nitti
Linde
Luttrel
M. Pellicoro
Parisi
Rose
Rypley
S. Stramaglia
Utsugi
Yuille
Publication venue: 'Elsevier BV'
Publication date: 01/01/2001
Field of study

Cost functions for non-hierarchical pairwise clustering are introduced, in the probabilistic autoencoder framework, by the request of maximal average similarity between the input and the output of the autoencoder. The partition provided by these cost functions identifies clusters with dense connected regions in data space; differences and similarities with respect to a well known cost function for pairwise clustering are outlined.Comment: 5 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Bari

Duality between Feature Selection and Data Clustering

Author: Al-Bashabsheh Ali
Chan Chung
Liu Tie
Zhou Qiaoqiao
Publication venue
Publication date: 05/10/2016
Field of study

The feature-selection problem is formulated from an information-theoretic perspective. We show that the problem can be efficiently solved by an extension of the recently proposed info-clustering paradigm. This reveals the fundamental duality between feature selection and data clustering,which is a consequence of the more general duality between the principal partition and the principal lattice of partitions in combinatorial optimization

arXiv.org e-Print Archive

Crossref