28,319 research outputs found
Improved Methods for Cluster Identification and Visualization
Self-organizing maps (SOMs) are self-organized projections of high dimensional data onto a low, typically two dimensional (2D), map wherein vector similarity is implicitly translated into topological closeness in the 2D projection. They are thus used for clustering and visualization of high dimensional data. However it is often challenging to interpret the results due to drawbacks of currently used methods for identifying and visualizing cluster boundaries in the resulting feature maps. In this thesis we introduce a new phase to the SOM that we refer to as the Cluster Reinforcement (CR) phase. The CR phase amplifies within-cluster similarity with the consequence that cluster boundaries become much more evident. We also define a new Boundary (B) matrix that makes cluster boundaries easy to visualize, can be thresholded at various levels to make cluster hierarchies apparent, and can be overlain directly onto maps of component planes (something that was not possible with previous methods). The combination of the SOM, CR phase and B-matrix comprise an automated method for improved identification and informative visualization of clusters in high dimensional data. We demonstrate these methods on three data sets: the classic 13- dimensional binary-valued “animal” benchmark test, actual 60-dimensional binaryvalued phonetic word clustering problem, and 3-dimensional real-valued geographic data clustering related to fuel efficiency of vehicle choice
Unsupervised classification of fully kinetic simulations of plasmoid instability using Self-Organizing Maps (SOMs)
The growing amount of data produced by simulations and observations of space
physics processes encourages the use of methods rooted in Machine Learning for
data analysis and physical discovery. We apply a clustering method based on
Self-Organizing Maps (SOM) to fully kinetic simulations of plasmoid
instability, with the aim of assessing its suitability as a reliable analysis
tool for both simulated and observed data. We obtain clusters that map well, a
posteriori, to our knowledge of the process: the clusters clearly identify the
inflow region, the inner plasmoid region, the separatrices, and regions
associated with plasmoid merging. SOM-specific analysis tools, such as feature
maps and Unified Distance Matrix, provide one with valuable insights into both
the physics at work and specific spatial regions of interest. The method
appears as a promising option for the analysis of data, both from simulations
and from observations, and could also potentially be used to trigger the switch
to different simulation models or resolution in coupled codes for space
simulations
Multiorder neurons for evolutionary higher-order clustering and growth
This letter proposes to use multiorder neurons for clustering irregularly shaped data arrangements. Multiorder neurons are an evolutionary extension of the use of higher-order neurons in clustering. Higher-order neurons parametrically model complex neuron shapes by replacing the classic synaptic weight by higher-order tensors. The multiorder neuron goes one step further and eliminates two problems associated with higher-order neurons. First, it uses evolutionary algorithms to select the best neuron order for a given problem. Second, it obtains more information about the underlying data distribution by identifying the correct order for a given cluster of patterns. Empirically we observed that when the correlation of clusters found with ground truth information is used in measuring clustering accuracy, the proposed evolutionary multiorder neurons method can be shown to outperform other related clustering methods. The simulation results from the Iris, Wine, and Glass data sets show significant improvement when compared to the results obtained using self-organizing maps and higher-order neurons. The letter also proposes an intuitive model by which multiorder neurons can be grown, thereby determining the number of clusters in data
How Many Dissimilarity/Kernel Self Organizing Map Variants Do We Need?
In numerous applicative contexts, data are too rich and too complex to be
represented by numerical vectors. A general approach to extend machine learning
and data mining techniques to such data is to really on a dissimilarity or on a
kernel that measures how different or similar two objects are. This approach
has been used to define several variants of the Self Organizing Map (SOM). This
paper reviews those variants in using a common set of notations in order to
outline differences and similarities between them. It discusses the advantages
and drawbacks of the variants, as well as the actual relevance of the
dissimilarity/kernel SOM for practical applications
Batch kernel SOM and related Laplacian methods for social network analysis
Large graphs are natural mathematical models for describing the structure of
the data in a wide variety of fields, such as web mining, social networks,
information retrieval, biological networks, etc. For all these applications,
automatic tools are required to get a synthetic view of the graph and to reach
a good understanding of the underlying problem. In particular, discovering
groups of tightly connected vertices and understanding the relations between
those groups is very important in practice. This paper shows how a kernel
version of the batch Self Organizing Map can be used to achieve these goals via
kernels derived from the Laplacian matrix of the graph, especially when it is
used in conjunction with more classical methods based on the spectral analysis
of the graph. The proposed method is used to explore the structure of a
medieval social network modeled through a weighted graph that has been directly
built from a large corpus of agrarian contracts
Recommended from our members
Soft topographic map for clustering and classification of bacteria
In this work a new method for clustering and building a
topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different
type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria
class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification
or erroneous annotations in the database
- …