6 research outputs found

    Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

    Full text link
    Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on information recovery metrics. Our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it absolutely superior. Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters

    Node, Node-Link, and Node-Link-Group Diagrams: An Evaluation

    Get PDF
    Abstract鈥擡ffectively showing the relationships between objects in a dataset is one of the main tasks in information visualization. Typically there is a well-defined notion of distance between pairs of objects, and traditional approaches such as principal component analysis or multi-dimensional scaling are used to place the objects as points in 2D space, so that similar objects are close to each other. In another typical setting, the dataset is visualized as a network graph, where related nodes are connected by links. More recently, datasets are also visualized as maps, where in addition to nodes and links, there is an explicit representation of groups and clusters. We consider these three Techniques, characterized by a progressive increase of the amount of encoded information: node diagrams, node-link diagrams and node-link-group diagrams. We assess these three types of diagrams with a controlled experiment that covers nine different tasks falling broadly in three categories: node-based tasks, network-based tasks and group-based tasks. Our findings indicate that adding links, or links and group representations, does not negatively impact performance (time and accuracy) of node-based tasks. Similarly, adding group representations does not negatively impact the performance of network-based tasks. Node-link-group diagrams outperform the others on group-based tasks. These conclusions contradict results in other studies, in similar but subtly different settings. Taken together, however, such results can have significant implications for the design of standard and domain specific visualizations tools. Index Terms鈥攇raphs, networks, maps, scatter plots

    Visualizaci贸n de conocimiento en temas espec铆ficos mediante el uso del correo electr贸nico corporativo

    Get PDF
    Proyecto de Graduaci贸n (Maestr铆a en Computaci贸n con 茅nfasis en Sistemas de Informaci贸n) Instituto Tecnol贸gico de Costa Rica, Escuela de Ingenier铆a en Computaci贸n, 2014.Understanding who within a corporation has knowledge on a given topic, is key for effective decision making that enables the best use of the resources and strengthens collaboration. Based on studies of social network analysis, information visualization techniques are evaluated, the most adequate visualization is selected, and improvements developed over it

    Short Papers Visualizing Graphs as Maps with Contiguous Regions

    No full text
    Relational datasets, which include clustering information, can be visualized with tools such as BubbleSets, Line-Sets, SOM, and GMap. The countries in SOM-based and GMap-based visualizations are fragmented, i.e., they are represented by several disconnected regions. While BubbleSets and LineSets have contiguous regions, these regions may overlap, even when the input clustering is non-overlapping. We describe two methods for creating non-fragmented and non-overlapping maps within the GMap framework. The first approach achieves contiguity by preserving the given embedding and creating a clustering based on geometric proximity. The second approach achieves contiguity by preserving the clustering information. The methods are quantitatively evaluated using embedding and clustering metrics, and their usefulness is demonstrated with several real-world datasets and a fullyfunctional online system at gmap.cs.arizona.edu. 1
    corecore