Search CORE

6 research outputs found

Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale

Author: Börner Katy
Emmons Scott
Gallant Mike
Kobourov Stephen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 08/07/2016
Field of study

Notions of community quality underlie network clustering. While studies surrounding network clustering are increasingly common, a precise understanding of the realtionship between different cluster quality metrics is unknown. In this paper, we examine the relationship between stand-alone cluster quality metrics and information recovery metrics through a rigorous analysis of four widely-used network clustering algorithms -- Louvain, Infomap, label propagation, and smart local moving. We consider the stand-alone quality metrics of modularity, conductance, and coverage, and we consider the information recovery metrics of adjusted Rand score, normalized mutual information, and a variant of normalized mutual information used in previous work. Our study includes both synthetic graphs and empirical data sets of sizes varying from 1,000 to 1,000,000 nodes. We find significant differences among the results of the different cluster quality metrics. For example, clustering algorithms can return a value of 0.4 out of 1 on modularity but score 0 out of 1 on information recovery. We find conductance, though imperfect, to be the stand-alone quality metric that best indicates performance on information recovery metrics. Our study shows that the variant of normalized mutual information used in previous work cannot be assumed to differ only slightly from traditional normalized mutual information. Smart local moving is the best performing algorithm in our study, but discrepancies between cluster evaluation metrics prevent us from declaring it absolutely superior. Louvain performed better than Infomap in nearly all the tests in our study, contradicting the results of previous work in which Infomap was superior to Louvain. We find that although label propagation performs poorly when clusters are less clearly defined, it scales efficiently and accurately to large graphs with well-defined clusters

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

The University of Arizona

Node, Node-Link, and Node-Link-Group Diagrams: An Evaluation

Author: Bahador Saket
Katy Börner
Paolo Simonetto
Stephen Kobourov
Publication venue
Publication date: 07/04/2014
Field of study

Abstract—Effectively showing the relationships between objects in a dataset is one of the main tasks in information visualization. Typically there is a well-defined notion of distance between pairs of objects, and traditional approaches such as principal component analysis or multi-dimensional scaling are used to place the objects as points in 2D space, so that similar objects are close to each other. In another typical setting, the dataset is visualized as a network graph, where related nodes are connected by links. More recently, datasets are also visualized as maps, where in addition to nodes and links, there is an explicit representation of groups and clusters. We consider these three Techniques, characterized by a progressive increase of the amount of encoded information: node diagrams, node-link diagrams and node-link-group diagrams. We assess these three types of diagrams with a controlled experiment that covers nine different tasks falling broadly in three categories: node-based tasks, network-based tasks and group-based tasks. Our findings indicate that adding links, or links and group representations, does not negatively impact performance (time and accuracy) of node-based tasks. Similarly, adding group representations does not negatively impact the performance of network-based tasks. Node-link-group diagrams outperform the others on group-based tasks. These conclusions contradict results in other studies, in similar but subtly different settings. Taken together, however, such results can have significant implications for the design of standard and domain specific visualizations tools. Index Terms—graphs, networks, maps, scatter plots

arXiv.org e-Print Archive

CiteSeerX

Visualización de conocimiento en temas específicos mediante el uso del correo electrónico corporativo

Author: Alvarado-Brenes Berny
Publication venue: Instituto Tecnológico de Costa Rica.
Publication date: 01/11/2014
Field of study

Proyecto de Graduación (Maestría en Computación con énfasis en Sistemas de Información) Instituto Tecnológico de Costa Rica, Escuela de Ingeniería en Computación, 2014.Understanding who within a corporation has knowledge on a given topic, is key for effective decision making that enables the best use of the resources and strengthens collaboration. Based on studies of social network analysis, information visualization techniques are evaluated, the most adequate visualization is selected, and improvements developed over it

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional del Instituto Tecnologico de Costa Rica

Short Papers Visualizing Graphs as Maps with Contiguous Regions

Author: J. Kennedy (editors
M. Hlawitschka
N. Elmqvist
Paolo Simonetto
Sergey Pupyrev
Stephen G. Kobourov
Publication venue
Publication date
Field of study

Relational datasets, which include clustering information, can be visualized with tools such as BubbleSets, Line-Sets, SOM, and GMap. The countries in SOM-based and GMap-based visualizations are fragmented, i.e., they are represented by several disconnected regions. While BubbleSets and LineSets have contiguous regions, these regions may overlap, even when the input clustering is non-overlapping. We describe two methods for creating non-fragmented and non-overlapping maps within the GMap framework. The first approach achieves contiguity by preserving the given embedding and creating a clustering based on geometric proximity. The second approach achieves contiguity by preserving the clustering information. The methods are quantitatively evaluated using embedding and clustering metrics, and their usefulness is demonstrated with several real-world datasets and a fullyfunctional online system at gmap.cs.arizona.edu. 1

CiteSeerX