12,789 research outputs found
Soft topographic map for clustering and classification of bacteria
In this work a new method for clustering and building a
topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called âhousekeeping genesâ. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different
type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria
class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification
or erroneous annotations in the database
How Many Dissimilarity/Kernel Self Organizing Map Variants Do We Need?
In numerous applicative contexts, data are too rich and too complex to be
represented by numerical vectors. A general approach to extend machine learning
and data mining techniques to such data is to really on a dissimilarity or on a
kernel that measures how different or similar two objects are. This approach
has been used to define several variants of the Self Organizing Map (SOM). This
paper reviews those variants in using a common set of notations in order to
outline differences and similarities between them. It discusses the advantages
and drawbacks of the variants, as well as the actual relevance of the
dissimilarity/kernel SOM for practical applications
Multidimensional Urban Segregation - Toward A Neural Network Measure
We introduce a multidimensional, neural-network approach to reveal and
measure urban segregation phenomena, based on the Self-Organizing Map algorithm
(SOM). The multidimensionality of SOM allows one to apprehend a large number of
variables simultaneously, defined on census or other types of statistical
blocks, and to perform clustering along them. Levels of segregation are then
measured through correlations between distances on the neural network and
distances on the actual geographical map. Further, the stochasticity of SOM
enables one to quantify levels of heterogeneity across census blocks. We
illustrate this new method on data available for the city of Paris.Comment: NCAA S.I. WSOM+ 201
Batch and median neural gas
Neural Gas (NG) constitutes a very robust clustering algorithm given
euclidian data which does not suffer from the problem of local minima like
simple vector quantization, or topological restrictions like the
self-organizing map. Based on the cost function of NG, we introduce a batch
variant of NG which shows much faster convergence and which can be interpreted
as an optimization of the cost function by the Newton method. This formulation
has the additional benefit that, based on the notion of the generalized median
in analogy to Median SOM, a variant for non-vectorial proximity data can be
introduced. We prove convergence of batch and median versions of NG, SOM, and
k-means in a unified formulation, and we investigate the behavior of the
algorithms in several experiments.Comment: In Special Issue after WSOM 05 Conference, 5-8 september, 2005, Pari
How to improve robustness in Kohonen maps and display additional information in Factorial Analysis: application to text mining
This article is an extended version of a paper presented in the WSOM'2012
conference [1]. We display a combination of factorial projections, SOM
algorithm and graph techniques applied to a text mining problem. The corpus
contains 8 medieval manuscripts which were used to teach arithmetic techniques
to merchants. Among the techniques for Data Analysis, those used for
Lexicometry (such as Factorial Analysis) highlight the discrepancies between
manuscripts. The reason for this is that they focus on the deviation from the
independence between words and manuscripts. Still, we also want to discover and
characterize the common vocabulary among the whole corpus. Using the properties
of stochastic Kohonen maps, which define neighborhood between inputs in a
non-deterministic way, we highlight the words which seem to play a special role
in the vocabulary. We call them fickle and use them to improve both Kohonen map
robustness and significance of FCA visualization. Finally we use graph
algorithmic to exploit this fickleness for classification of words
Some Further Evidence about Magnification and Shape in Neural Gas
Neural gas (NG) is a robust vector quantization algorithm with a well-known
mathematical model. According to this, the neural gas samples the underlying
data distribution following a power law with a magnification exponent that
depends on data dimensionality only. The effects of shape in the input data
distribution, however, are not entirely covered by the NG model above, due to
the technical difficulties involved. The experimental work described here shows
that shape is indeed relevant in determining the overall NG behavior; in
particular, some experiments reveal richer and complex behaviors induced by
shape that cannot be explained by the power law alone. Although a more
comprehensive analytical model remains to be defined, the evidence collected in
these experiments suggests that the NG algorithm has an interesting potential
for detecting complex shapes in noisy datasets
Techniques for clustering gene expression data
Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered
Batch kernel SOM and related Laplacian methods for social network analysis
Large graphs are natural mathematical models for describing the structure of
the data in a wide variety of fields, such as web mining, social networks,
information retrieval, biological networks, etc. For all these applications,
automatic tools are required to get a synthetic view of the graph and to reach
a good understanding of the underlying problem. In particular, discovering
groups of tightly connected vertices and understanding the relations between
those groups is very important in practice. This paper shows how a kernel
version of the batch Self Organizing Map can be used to achieve these goals via
kernels derived from the Laplacian matrix of the graph, especially when it is
used in conjunction with more classical methods based on the spectral analysis
of the graph. The proposed method is used to explore the structure of a
medieval social network modeled through a weighted graph that has been directly
built from a large corpus of agrarian contracts
- âŠ