1,733 research outputs found

    A combined measure for quantifying and qualifying the topology preservation of growing self-organizing maps

    Get PDF
    The Self-OrganizingMap (SOM) is a neural network model that performs an ordered projection of a high dimensional input space in a low-dimensional topological structure. The process in which such mapping is formed is defined by the SOM algorithm, which is a competitive, unsupervised and nonparametric method, since it does not make any assumption about the input data distribution. The feature maps provided by this algorithm have been successfully applied for vector quantization, clustering and high dimensional data visualization processes. However, the initialization of the network topology and the selection of the SOM training parameters are two difficult tasks caused by the unknown distribution of the input signals. A misconfiguration of these parameters can generate a feature map of low-quality, so it is necessary to have some measure of the degree of adaptation of the SOM network to the input data model. The topologypreservation is the most common concept used to implement this measure. Several qualitative and quantitative methods have been proposed for measuring the degree of SOM topologypreservation, particularly using Kohonen's model. In this work, two methods for measuring the topologypreservation of the Growing Cell Structures (GCSs) model are proposed: the topographic function and the topology preserving ma

    Visualization of clusters in geo-referenced data using three-dimensional self-organizing maps

    Get PDF
    Dissertação apresentada como requisito parcial para obtenção do grau de Mestre em Estatística e Gestão de InformaçãoThe Self-Organizing Map (SOM) is an artificial neural network that performs simultaneously vector quantization and vector projection. Due to this characteristic, the SOM is an effective method for clustering analysis via visualization. The SOM can be visualized through the output space, generally a regular two-dimensional grid of nodes, and through the input space, emphasizing the vector quantization process. Among all the strategies for visualizing the SOM, we are particularly interested in those that allow dealing with spatial dependency, linking the SOM to the geographic visualization with color. One possible approach, commonly used, is the cartographic representation of data with label colors defined from the output space of a two-dimensional SOM. However, in the particular case of geo-referenced data, it is possible to consider the use of a three-dimensional SOM for this purpose, thus adding one more dimension in the analysis. In this dissertation is presented a method for clustering geo-referenced data that integrates the visualization of both perspectives of a three dimensional SOM: linking its output space to the cartographic representation through a ordered set of colors; and exploring the use of frontiers among geo-referenced elements, computed according to the distances in the input space between their Best Matching Units

    Visual data mining: integrating machine learning with information visualization

    Get PDF
    Today, the data available to tackle many scientific challenges is vast in quantity and diverse in nature. The exploration of heterogeneous information spaces requires suitable mining algorithms as well as effective visual interfaces. Most existing systems concentrate either on mining algorithms or on visualization techniques. Though visual methods developed in information visualization have been helpful, for improved understanding of a complex large high-dimensional dataset, there is a need for an effective projection of such a dataset onto a lower-dimension (2D or 3D) manifold. This paper introduces a flexible visual data mining framework which combines advanced projection algorithms developed in the machine learning domain and visual techniques developed in the information visualization domain. The framework follows Shneiderman’s mantra to provide an effective user interface. The advantage of such an interface is that the user is directly involved in the data mining process. We integrate principled projection methods, such as Generative Topographic Mapping (GTM) and Hierarchical GTM (HGTM), with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates, billboarding, and user interaction facilities, to provide an integrated visual data mining framework. Results on a real life high-dimensional dataset from the chemoinformatics domain are also reported and discussed. Projection results of GTM are analytically compared with the projection results from other traditional projection methods, and it is also shown that the HGTM algorithm provides additional value for large datasets. The computational complexity of these algorithms is discussed to demonstrate their suitability for the visual data mining framework

    eXamine: a Cytoscape app for exploring annotated modules in networks

    Get PDF
    Background. Biological networks have growing importance for the interpretation of high-throughput "omics" data. Statistical and combinatorial methods allow to obtain mechanistic insights through the extraction of smaller subnetwork modules. Further enrichment analyses provide set-based annotations of these modules. Results. We present eXamine, a set-oriented visual analysis approach for annotated modules that displays set membership as contours on top of a node-link layout. Our approach extends upon Self Organizing Maps to simultaneously lay out nodes, links, and set contours. Conclusions. We implemented eXamine as a freely available Cytoscape app. Using eXamine we study a module that is activated by the virally-encoded G-protein coupled receptor US28 and formulate a novel hypothesis about its functioning

    Neural visualization of network traffic data for intrusion detection

    Get PDF
    This study introduces and describes a novel intrusion detection system (IDS) called MOVCIDS (mobile visualization connectionist IDS). This system applies neural projection architectures to detect anomalous situations taking place in a computer network. By its advanced visualization facilities, the proposed IDS allows providing an overview of the network traffic as well as identifying anomalous situations tackled by computer networks, responding to the challenges presented by volume, dynamics and diversity of the traffic, including novel (0-day) attacks. MOVCIDS provides a novel point of view in the field of IDSs by enabling the most interesting projections (based on the fourth order statistics; the kurtosis index) of a massive traffic dataset to be extracted. These projections are then depicted through a functional and mobile visualization interface, providing visual information of the internal structure of the traffic data. The interface makes MOVCIDS accessible from any mobile device to give more accessibility to network administrators, enabling continuous visualization, monitoring and supervision of computer networks. Additionally, a novel testing technique has been developed to evaluate MOVCIDS and other IDSs employing numerical datasets. To show the performance and validate the proposed IDS, it has been tested in different real domains containing several attacks and anomalous situations. In addition, the importance of the temporal dimension on intrusion detection, and the ability of this IDS to process it, are emphasized in this workJunta de Castilla and Leon project BU006A08, Business intelligence for production within the framework of the Instituto Tecnologico de Cas-tilla y Leon (ITCL) and the Agencia de Desarrollo Empresarial (ADE), and the Spanish Ministry of Education and Innovation project CIT-020000-2008-2. The authors would also like to thank the vehicle interior manufacturer, Grupo Antolin Ingenieria S. A., within the framework of the project MAGNO2008-1028-CENIT Project funded by the Spanish Government

    SOMvisua: Data Clustering and Visualization Based on SOM and GHSOM

    Get PDF
    Text in web pages is based on expert opinion of a large number of people including the views of authors. These views are based on cultural or community aspects, which make extracting information from text very difficult. Search in text usually finds text similarities between paragraphs in documents. This paper proposes a framework for data clustering and visualization called SOMvisua. SOMvisua is based on a graph representation of data input for Self-Organizing Map (SOM) and Growing Hierarchically Self-Organizing Map (GHSOM) algorithms. In SOMvisua, sentences from an input article are represented as graph model instead of vector space model. SOM and GHSOM clustering algorithms construct knowledge from this article

    Seeing is believing: the importance of visualization in real-world machine learning applications

    Get PDF
    The increasing availability of data sets with a huge amount of information, coded in many diff erent features, justifi es the research on new methods of knowledge extraction: the great challenge is the translation of the raw data into useful information that can be used to improve decisionmaking processes, detect relevant profi les, fi nd out relationships among features, etc. It is undoubtedly true that a picture is worth a thousand words, what makes visualization methods be likely the most appealing and one of the most relevant kinds of knowledge extration methods. At ESANN 2011, the special session "Seeing is believing: The importance of visualization in real-world machine learning applications" reflects some of the main emerging topics in the field. This tutorial prefaces the session, summarizing some of its contributions, while also providing some clues to the current state and the near future of visualization methods within the framework of Machine Learning.Postprint (published version

    Ranked centroid projection: A data visualization approach based on self-organizing maps

    Get PDF
    The Self-Organizing Map (SOM) is an unsupervised neural network model that provides topology-preserving mapping from high-dimensional input spaces onto a commonly two-dimensional output space. In this study, the clustering and visualization capabilities of the SOM, especially in the analysis of textual data, i.e. document collections, are reviewed and further developed. A novel clustering and visualization approach based on the SOM is proposed for the task of text data mining. The proposed approach first transforms the document space into a multi-dimensional vector space by means of document encoding. Then a growing hierarchical SOM (GHSOM) is trained and used as a baseline framework, which automatically produces maps with various levels of details. Following the training of the GHSOM, a novel projection method, namely the Ranked Centroid Projection (RCP), is applied to project the input vectors onto a hierarchy of two-dimensional output maps. The projection of the input vectors is treated as a vector interpolation into a two-dimensional regular map grid. A ranking scheme is introduced to select the nearest R units around the input vector in the original data space, the positions of which will be taken into account in computing the projection coordinates.The proposed approach can be used both as a data analysis tool and as a direct interface to the data. Its applicability has been demonstrated in this study using an illustrative data set and two real-world document clustering tasks, i.e. the SOM paper collection and the Anthrax paper collection. Based on the proposed approach, a software toolbox is designed for analyzing and visualizing document collections, which provides a user-friendly interface and several exploration and analysis functions.The presented SOM-based approach incorporates several unique features, such as the adaptive structure, the hierarchical training, the automatic parameter adjustment and the incremental clustering. Its advantages include the ability to convey a large amount of information in a limited space with comparatively low computation load, the potential to reveal conceptual relationships among documents, and the facilitation of perceptual inferences on both inter-cluster and within-cluster relationships
    corecore