5,517 research outputs found
Fast Algorithm and Implementation of Dissimilarity Self-Organizing Maps
In many real world applications, data cannot be accurately represented by
vectors. In those situations, one possible solution is to rely on dissimilarity
measures that enable sensible comparison between observations. Kohonen's
Self-Organizing Map (SOM) has been adapted to data described only through their
dissimilarity matrix. This algorithm provides both non linear projection and
clustering of non vector data. Unfortunately, the algorithm suffers from a high
cost that makes it quite difficult to use with voluminous data sets. In this
paper, we propose a new algorithm that provides an important reduction of the
theoretical cost of the dissimilarity SOM without changing its outcome (the
results are exactly the same as the ones obtained with the original algorithm).
Moreover, we introduce implementation methods that result in very short running
times. Improvements deduced from the theoretical cost model are validated on
simulated and real world data (a word list clustering problem). We also
demonstrate that the proposed implementation methods reduce by a factor up to 3
the running time of the fast algorithm over a standard implementation
Neural Networks for Complex Data
Artificial neural networks are simple and efficient machine learning tools.
Defined originally in the traditional setting of simple vector data, neural
network models have evolved to address more and more difficulties of complex
real world problems, ranging from time evolving data to sophisticated data
structures such as graphs and functions. This paper summarizes advances on
those themes from the last decade, with a focus on results obtained by members
of the SAMM team of Universit\'e Paris
Batch and median neural gas
Neural Gas (NG) constitutes a very robust clustering algorithm given
euclidian data which does not suffer from the problem of local minima like
simple vector quantization, or topological restrictions like the
self-organizing map. Based on the cost function of NG, we introduce a batch
variant of NG which shows much faster convergence and which can be interpreted
as an optimization of the cost function by the Newton method. This formulation
has the additional benefit that, based on the notion of the generalized median
in analogy to Median SOM, a variant for non-vectorial proximity data can be
introduced. We prove convergence of batch and median versions of NG, SOM, and
k-means in a unified formulation, and we investigate the behavior of the
algorithms in several experiments.Comment: In Special Issue after WSOM 05 Conference, 5-8 september, 2005, Pari
Batch kernel SOM and related Laplacian methods for social network analysis
Large graphs are natural mathematical models for describing the structure of
the data in a wide variety of fields, such as web mining, social networks,
information retrieval, biological networks, etc. For all these applications,
automatic tools are required to get a synthetic view of the graph and to reach
a good understanding of the underlying problem. In particular, discovering
groups of tightly connected vertices and understanding the relations between
those groups is very important in practice. This paper shows how a kernel
version of the batch Self Organizing Map can be used to achieve these goals via
kernels derived from the Laplacian matrix of the graph, especially when it is
used in conjunction with more classical methods based on the spectral analysis
of the graph. The proposed method is used to explore the structure of a
medieval social network modeled through a weighted graph that has been directly
built from a large corpus of agrarian contracts
Finding groups in data: Cluster analysis with ants
Wepresent in this paper a modification of Lumer and Faietaâs algorithm for data clustering. This approach
mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically
clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus
on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on
the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine,
and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more
conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant
clustering algorithms have received special attention, especially because they still require much
investigation to improve performance, stability and other key features that would make such algorithms
mature tools for data mining.
As a case study, this paper focus on the behavior of clustering procedures in those new approaches.
The proposed algorithm and its modifications are evaluated in a number of well-known benchmark
datasets. Empirical results clearly show that ant-based clustering algorithms performs well when
compared to another techniques
- âŠ