27,511 research outputs found

    Improved Methods for Cluster Identification and Visualization

    Get PDF
    Self-organizing maps (SOMs) are self-organized projections of high dimensional data onto a low, typically two dimensional (2D), map wherein vector similarity is implicitly translated into topological closeness in the 2D projection. They are thus used for clustering and visualization of high dimensional data. However it is often challenging to interpret the results due to drawbacks of currently used methods for identifying and visualizing cluster boundaries in the resulting feature maps. In this thesis we introduce a new phase to the SOM that we refer to as the Cluster Reinforcement (CR) phase. The CR phase amplifies within-cluster similarity with the consequence that cluster boundaries become much more evident. We also define a new Boundary (B) matrix that makes cluster boundaries easy to visualize, can be thresholded at various levels to make cluster hierarchies apparent, and can be overlain directly onto maps of component planes (something that was not possible with previous methods). The combination of the SOM, CR phase and B-matrix comprise an automated method for improved identification and informative visualization of clusters in high dimensional data. We demonstrate these methods on three data sets: the classic 13- dimensional binary-valued “animal” benchmark test, actual 60-dimensional binaryvalued phonetic word clustering problem, and 3-dimensional real-valued geographic data clustering related to fuel efficiency of vehicle choice

    Unsupervised classification of fully kinetic simulations of plasmoid instability using Self-Organizing Maps (SOMs)

    Full text link
    The growing amount of data produced by simulations and observations of space physics processes encourages the use of methods rooted in Machine Learning for data analysis and physical discovery. We apply a clustering method based on Self-Organizing Maps (SOM) to fully kinetic simulations of plasmoid instability, with the aim of assessing its suitability as a reliable analysis tool for both simulated and observed data. We obtain clusters that map well, a posteriori, to our knowledge of the process: the clusters clearly identify the inflow region, the inner plasmoid region, the separatrices, and regions associated with plasmoid merging. SOM-specific analysis tools, such as feature maps and Unified Distance Matrix, provide one with valuable insights into both the physics at work and specific spatial regions of interest. The method appears as a promising option for the analysis of data, both from simulations and from observations, and could also potentially be used to trigger the switch to different simulation models or resolution in coupled codes for space simulations

    Multiorder neurons for evolutionary higher-order clustering and growth

    Get PDF
    This letter proposes to use multiorder neurons for clustering irregularly shaped data arrangements. Multiorder neurons are an evolutionary extension of the use of higher-order neurons in clustering. Higher-order neurons parametrically model complex neuron shapes by replacing the classic synaptic weight by higher-order tensors. The multiorder neuron goes one step further and eliminates two problems associated with higher-order neurons. First, it uses evolutionary algorithms to select the best neuron order for a given problem. Second, it obtains more information about the underlying data distribution by identifying the correct order for a given cluster of patterns. Empirically we observed that when the correlation of clusters found with ground truth information is used in measuring clustering accuracy, the proposed evolutionary multiorder neurons method can be shown to outperform other related clustering methods. The simulation results from the Iris, Wine, and Glass data sets show significant improvement when compared to the results obtained using self-organizing maps and higher-order neurons. The letter also proposes an intuitive model by which multiorder neurons can be grown, thereby determining the number of clusters in data

    How Many Dissimilarity/Kernel Self Organizing Map Variants Do We Need?

    Full text link
    In numerous applicative contexts, data are too rich and too complex to be represented by numerical vectors. A general approach to extend machine learning and data mining techniques to such data is to really on a dissimilarity or on a kernel that measures how different or similar two objects are. This approach has been used to define several variants of the Self Organizing Map (SOM). This paper reviews those variants in using a common set of notations in order to outline differences and similarities between them. It discusses the advantages and drawbacks of the variants, as well as the actual relevance of the dissimilarity/kernel SOM for practical applications

    Batch kernel SOM and related Laplacian methods for social network analysis

    Get PDF
    Large graphs are natural mathematical models for describing the structure of the data in a wide variety of fields, such as web mining, social networks, information retrieval, biological networks, etc. For all these applications, automatic tools are required to get a synthetic view of the graph and to reach a good understanding of the underlying problem. In particular, discovering groups of tightly connected vertices and understanding the relations between those groups is very important in practice. This paper shows how a kernel version of the batch Self Organizing Map can be used to achieve these goals via kernels derived from the Laplacian matrix of the graph, especially when it is used in conjunction with more classical methods based on the spectral analysis of the graph. The proposed method is used to explore the structure of a medieval social network modeled through a weighted graph that has been directly built from a large corpus of agrarian contracts

    Soft topographic map for clustering and classification of bacteria

    Get PDF
    In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and present some singular cases potentially due to incorrect classification or erroneous annotations in the database

    Batch and median neural gas

    Full text link
    Neural Gas (NG) constitutes a very robust clustering algorithm given euclidian data which does not suffer from the problem of local minima like simple vector quantization, or topological restrictions like the self-organizing map. Based on the cost function of NG, we introduce a batch variant of NG which shows much faster convergence and which can be interpreted as an optimization of the cost function by the Newton method. This formulation has the additional benefit that, based on the notion of the generalized median in analogy to Median SOM, a variant for non-vectorial proximity data can be introduced. We prove convergence of batch and median versions of NG, SOM, and k-means in a unified formulation, and we investigate the behavior of the algorithms in several experiments.Comment: In Special Issue after WSOM 05 Conference, 5-8 september, 2005, Pari
    • …
    corecore