227 research outputs found

    Self-Organizing Maps for clustering and visualization of bipartite graphs

    No full text
    National audienceGraphs (also frequently called networks) have attracted a burst of attention in the last years, with applications to social science, biology, computer science... The present paper proposes a data mining method for visualizing and clustering the nodes of a peculiar class of graphs: bipartite graphs. The method is based on a self-organizing map algorithm and relies on an extension of this approach to data described by a dissimilarity matrix

    Multiple dissimilarity SOM for clustering and visualizing graphs with node and edge attributes

    No full text
    International audienceWhen wanting to understand the way a graph G is structured and how the relations it models organize groups of entities, clustering and visualization can be combined to provide the user with a global overview of the graph, on the form of a projected graph: a simplified graph is visualized in which the nodes correspond to a cluster of nodes in the original graph G (with a size proportional to the number of nodes that are classified inside this cluster) and the edges between two nodes have a width proportional to the number of links between the nodes of G classified in the two corresponding clusters. This approach can be trickier when additional attributes (numerical or factors) describe the nodes of G or when the edges of G are of different types and should be treated separately: the simplified representation should then represent similarities for all sets of information. In this proposal, we present a variant of Self-Organizing Maps (SOM), which is adapted to data described by one or several (dis)similarities or kernels recently published in (Olteanu & Villa-Vialaneix, 2015) and which is able to combine clustering and visualization for this kind of graphs

    Utiliser SOMbrero pour la classification et la visualisation de graphes

    No full text
    International audienceGraphs have attracted a burst of attention in the last years, with applications to social science, biology, computer science... In the present paper, we illustrate how self-organizing maps (SOM) can be used to enlighten the structure of the graph, performing clustering of the graph together with visualization of a simplified graph. In particular, we present the R package SOMbrero which implements a stochastic version of the so-called relational algorithm: the method is able to process any dissimilarity data and several dissimilarities adapted to graphs are described and compared. The use of the package is illustrated on two real-world datasets: one, included in the package itself, is small enough to allow for a full investigation of the influence of the choice of a dissimilarity to measure the proximity between the vertices on the results. The other example comes from an application in biology and is based on a large bipartite graph of chemical reactions with several thousands vertices.L'analyse de graphes a connu un intérêt croissant dans les dernières années, avec des applications en sciences sociales, biologie, informatique, ... Dans cet article, nous illustrons comment les cartes auto-organisatrices (SOM) peuvent être utilisées pour mettre en lumière la structure d'un graphe en combinant la classification de ses sommets avec une visualisation simplifiée de celui-ci. En particulier, nous présentons le package R SOMbrero dans lequel est implémentée une version stochastique de l'approche dite « relationnelle » de l'algorithme de cartes auto-organisatrices. Cette méthode permet d'utiliser les cartes auto-organisatrices avec des données décrites par des mesures de dissimilarité et nous discutons et comparons ici plusieurs types de dissimilarités adaptées aux graphes. L'utilisation du package est illustrée sur deux jeux de données réelles : le premier, inclus dans le package lui-même, est suffisamment petit pour permettre l'analyse complète de l'influence du choix de la mesure de dissimilarité sur les résultats. Le second exemple provient d'une application en biologie et est basé sur un graphe biparti de grande taille, issu de réactions chimiques et qui contient plusieurs milliers de noeuds

    On-line relational and multiple relational SOM

    No full text
    International audienceIn some applications and in order to address real-world situations better, data may be more complex than simple numerical vectors. In some examples, data can be known only through their pairwise dissimilarities or through multiple dissimilarities, each of them describing a particular feature of the data set. Several variants of the Self Organizing Map (SOM) algorithm were introduced to generalize the original algorithm to the framework of dissimilarity data. Whereas median SOM is based on a rough representation of the prototypes, relational SOM allows representing these prototypes by a virtual linear combination of all elements in the data set, referring to a pseudo-euclidean framework. In the present article, an on-line version of relational SOM is introduced and studied. Similarly to the situation in the Euclidean framework, this on-line algorithm provides a better organization and is much less sensible to prototype initialization than standard (batch) relational SOM. In a more general case, this stochastic version allows us to integrate an additional stochastic gradient descent step in the algorithm which can tune the respective weights of several dissimilarities in an optimal way: the resulting \emph{multiple relational SOM} thus has the ability to integrate several sources of data of different types, or to make a consensus between several dissimilarities describing the same data. The algorithms introduced in this manuscript are tested on several data sets, including categorical data and graphs. On-line relational SOM is currently available in the R package SOMbrero that can be downloaded at http://sombrero.r-forge.r-project.org or directly tested on its Web User Interface at http://shiny.nathalievilla.org/sombrero

    Statistique et Big Data Analytics; Volumétrie, L'Attaque des Clones

    Get PDF
    This article assumes acquired the skills and expertise of a statistician in unsupervised (NMF, k-means, SVD) and supervised learning (regression, CART, random forest). What skills and knowledge do a statistician must acquire to reach the "Volume" scale of big data? After a quick overview of the different strategies available and especially of those imposed by Hadoop, the algorithms of some available learning methods are outlined in order to understand how they are adapted to the strong stresses of the Map-Reduce functionalitie

    Analyse de données pour des graphes étiquetés

    No full text
    International audienceNous proposons une méthode de fouille de données pour un graphe dont les sommets sont étiquetés. Deux approches sont décrites et illustrées sur un jeu de données réelles : elles permettent une représentation du graphe qui combine les informations sur sa structure et sur la valeur de ses étiquettes. Cette visualisation peut être utilisée à des fins d'interprétation pour apporter des informations plus nuancées sur la caractérisation des sommets du graphe

    sexy-rgtk: a package for programming RGtk2 GUI in a user-friendly manner

    No full text
    National audienceThere are many di erent ways to program Graphical User Interfaces (GUI) in R. (Lawrence and Verzani, 2012) provides an overview of the available methods, describing ways to program R GUI with RGtk2, qtbase and tcltk. More recently, the package shiny, for building interactive web applications, was also released (the rst version has been published on December, 2012). By automatically indexing all objects and methods available in RGtk2, we developed a method for creating GTK2-based GUI, in a friendlier and more compact manner. Widgets are accessible with simple functions and options, as is more natural for a R language programmer

    A comparison between dissimilarity SOM and kernel SOM for clustering the vertices of a graph

    Get PDF
    International audienceFlexible and efficient variants of the Self Organizing Map algorithm have been proposed for non vector data, including, for example, the dissimilarity SOM (also called the Median SOM) and several kernelized versions of SOM. Although the first one is a generalization of the batch version of the SOM algorithm to data described by a dissimilarity measure, the various versions of the second ones are stochastic SOM. We propose here to introduce a batch version of the kernel SOM and to show how this one is related to the dissimilarity SOM. Finally, an application to the classification of the vertices of a graph is proposed and the algorithms are tested and compared on a simulated data set

    Analysis of the influence of a network on the values of its nodes : the use of spatial indexes.

    Get PDF
    National audienceA growing number of data are modeled by a graph that can sometimes be weighted: social network, biological network... In many situations, additional informations are provided with these relational data, related to each node of the graph: this can be a membership to a given social group (for social networks) or to a given proteins family (for protein interactions network). In this case, a important question is to understand if the value of this additional variable is influenced by the network. This paper presents exploratory tools to address this question that are based on tests coming from the field of spatial statistic. The use of these tests is illustrated on several examples, all coming from the social network framework
    corecore