13,238 research outputs found
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Notions of community quality underlie network clustering. While studies
surrounding network clustering are increasingly common, a precise understanding
of the realtionship between different cluster quality metrics is unknown. In
this paper, we examine the relationship between stand-alone cluster quality
metrics and information recovery metrics through a rigorous analysis of four
widely-used network clustering algorithms -- Louvain, Infomap, label
propagation, and smart local moving. We consider the stand-alone quality
metrics of modularity, conductance, and coverage, and we consider the
information recovery metrics of adjusted Rand score, normalized mutual
information, and a variant of normalized mutual information used in previous
work. Our study includes both synthetic graphs and empirical data sets of sizes
varying from 1,000 to 1,000,000 nodes.
We find significant differences among the results of the different cluster
quality metrics. For example, clustering algorithms can return a value of 0.4
out of 1 on modularity but score 0 out of 1 on information recovery. We find
conductance, though imperfect, to be the stand-alone quality metric that best
indicates performance on information recovery metrics. Our study shows that the
variant of normalized mutual information used in previous work cannot be
assumed to differ only slightly from traditional normalized mutual information.
Smart local moving is the best performing algorithm in our study, but
discrepancies between cluster evaluation metrics prevent us from declaring it
absolutely superior. Louvain performed better than Infomap in nearly all the
tests in our study, contradicting the results of previous work in which Infomap
was superior to Louvain. We find that although label propagation performs
poorly when clusters are less clearly defined, it scales efficiently and
accurately to large graphs with well-defined clusters
Clustering Partially Observed Graphs via Convex Optimization
This paper considers the problem of clustering a partially observed
unweighted graph---i.e., one where for some node pairs we know there is an edge
between them, for some others we know there is no edge, and for the remaining
we do not know whether or not there is an edge. We want to organize the nodes
into disjoint clusters so that there is relatively dense (observed)
connectivity within clusters, and sparse across clusters.
We take a novel yet natural approach to this problem, by focusing on finding
the clustering that minimizes the number of "disagreements"---i.e., the sum of
the number of (observed) missing edges within clusters, and (observed) present
edges across clusters. Our algorithm uses convex optimization; its basis is a
reduction of disagreement minimization to the problem of recovering an
(unknown) low-rank matrix and an (unknown) sparse matrix from their partially
observed sum. We evaluate the performance of our algorithm on the classical
Planted Partition/Stochastic Block Model. Our main theorem provides sufficient
conditions for the success of our algorithm as a function of the minimum
cluster size, edge density and observation probability; in particular, the
results characterize the tradeoff between the observation probability and the
edge density gap. When there are a constant number of clusters of equal size,
our results are optimal up to logarithmic factors.Comment: This is the final version published in Journal of Machine Learning
Research (JMLR). Partial results appeared in International Conference on
Machine Learning (ICML) 201
Quantifying and minimizing risk of conflict in social networks
Controversy, disagreement, conflict, polarization and opinion divergence in social networks have been the subject of much recent research. In particular, researchers have addressed the question of how such concepts can be quantified given people’s prior opinions, and how they can be optimized by influencing the opinion of a small number of people or by editing the network’s connectivity.
Here, rather than optimizing such concepts given a specific set of prior opinions, we study whether they can be optimized in the average case and in the worst case over all sets of prior opinions. In particular, we derive the worst-case and average-case conflict risk of networks, and we propose algorithms for optimizing these.
For some measures of conflict, these are non-convex optimization problems with many local minima. We provide a theoretical and empirical analysis of the nature of some of these local minima, and show how they are related to existing organizational structures.
Empirical results show how a small number of edits quickly decreases its conflict risk, both average-case and worst-case. Furthermore, it shows that minimizing average-case conflict risk often does not reduce worst-case conflict risk. Minimizing worst-case conflict risk on the other hand, while computationally more challenging, is generally effective at minimizing both worst-case as well as average-case conflict risk
A Deep Neural Network for Pixel-Level Electromagnetic Particle Identification in the MicroBooNE Liquid Argon Time Projection Chamber
We have developed a convolutional neural network (CNN) that can make a
pixel-level prediction of objects in image data recorded by a liquid argon time
projection chamber (LArTPC) for the first time. We describe the network design,
training techniques, and software tools developed to train this network. The
goal of this work is to develop a complete deep neural network based data
reconstruction chain for the MicroBooNE detector. We show the first
demonstration of a network's validity on real LArTPC data using MicroBooNE
collection plane images. The demonstration is performed for stopping muon and a
charged current neutral pion data samples
Graph matching: relax or not?
We consider the problem of exact and inexact matching of weighted undirected
graphs, in which a bijective correspondence is sought to minimize a quadratic
weight disagreement. This computationally challenging problem is often relaxed
as a convex quadratic program, in which the space of permutations is replaced
by the space of doubly-stochastic matrices. However, the applicability of such
a relaxation is poorly understood. We define a broad class of friendly graphs
characterized by an easily verifiable spectral property. We prove that for
friendly graphs, the convex relaxation is guaranteed to find the exact
isomorphism or certify its inexistence. This result is further extended to
approximately isomorphic graphs, for which we develop an explicit bound on the
amount of weight disagreement under which the relaxation is guaranteed to find
the globally optimal approximate isomorphism. We also show that in many cases,
the graph matching problem can be further harmlessly relaxed to a convex
quadratic program with only n separable linear equality constraints, which is
substantially more efficient than the standard relaxation involving 2n equality
and n^2 inequality constraints. Finally, we show that our results are still
valid for unfriendly graphs if additional information in the form of seeds or
attributes is allowed, with the latter satisfying an easy to verify spectral
characteristic
- …