35 research outputs found
The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing: Extended Survey
Graph processing is becoming increasingly prevalent across many application
domains. In spite of this prevalence, there is little research about how graphs
are actually used in practice. We performed an extensive study that consisted
of an online survey of 89 users, a review of the mailing lists, source
repositories, and whitepapers of a large suite of graph software products, and
in-person interviews with 6 users and 2 developers of these products. Our
online survey aimed at understanding: (i) the types of graphs users have; (ii)
the graph computations users run; (iii) the types of graph software users use;
and (iv) the major challenges users face when processing their graphs. We
describe the participants' responses to our questions highlighting common
patterns and challenges. Based on our interviews and survey of the rest of our
sources, we were able to answer some new questions that were raised by
participants' responses to our online survey and understand the specific
applications that use graph data and software. Our study revealed surprising
facts about graph processing in practice. In particular, real-world graphs
represent a very diverse range of entities and are often very large,
scalability and visualization are undeniably the most pressing challenges faced
by participants, and data integration, recommendations, and fraud detection are
very popular applications supported by existing graph software. We hope these
findings can guide future research
The Future is Big Graphs! A Community View on Graph Processing Systems
Graphs are by nature unifying abstractions that can leverage
interconnectedness to represent, explore, predict, and explain real- and
digital-world phenomena. Although real users and consumers of graph instances
and graph workloads understand these abstractions, future problems will require
new abstractions and systems. What needs to happen in the next decade for big
graph processing to continue to succeed?Comment: 12 pages, 3 figures, collaboration between the large-scale systems
and data management communities, work started at the Dagstuhl Seminar 19491
on Big Graph Processing Systems, to be published in the Communications of the
AC
Non-linear Attributed Graph Clustering by Symmetric NMF with PU Learning
We consider the clustering problem of attributed graphs. Our challenge is how
we can design an effective and efficient clustering method that precisely
captures the hidden relationship between the topology and the attributes in
real-world graphs. We propose Non-linear Attributed Graph Clustering by
Symmetric Non-negative Matrix Factorization with Positive Unlabeled Learning.
The features of our method are three holds. 1) it learns a non-linear
projection function between the different cluster assignments of the topology
and the attributes of graphs so as to capture the complicated relationship
between the topology and the attributes in real-world graphs, 2) it leverages
the positive unlabeled learning to take the effect of partially observed
positive edges into the cluster assignment, and 3) it achieves efficient
computational complexity, , where is the vertex size, is
the attribute size, is the number of clusters, and is the number of
iterations for learning the cluster assignment. We conducted experiments
extensively for various clustering methods with various real datasets to
validate that our method outperforms the former clustering methods regarding
the clustering quality
Parallel Batch-Dynamic Graph Connectivity
In this paper, we study batch parallel algorithms for the dynamic
connectivity problem, a fundamental problem that has received considerable
attention in the sequential setting. The most well known sequential algorithm
for dynamic connectivity is the elegant level-set algorithm of Holm, de
Lichtenberg and Thorup (HDT), which achieves amortized time per
edge insertion or deletion, and time per query. We
design a parallel batch-dynamic connectivity algorithm that is work-efficient
with respect to the HDT algorithm for small batch sizes, and is asymptotically
faster when the average batch size is sufficiently large. Given a sequence of
batched updates, where is the average batch size of all deletions, our
algorithm achieves expected amortized work per
edge insertion and deletion and depth w.h.p. Our algorithm
answers a batch of connectivity queries in expected
work and depth w.h.p. To the best of our knowledge, our algorithm
is the first parallel batch-dynamic algorithm for connectivity.Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201