521 research outputs found
Assessing the Computational Complexity of Multi-Layer Subgraph Detection
Multi-layer graphs consist of several graphs (layers) over the same vertex
set. They are motivated by real-world problems where entities (vertices) are
associated via multiple types of relationships (edges in different layers). We
chart the border of computational (in)tractability for the class of subgraph
detection problems on multi-layer graphs, including fundamental problems such
as maximum matching, finding certain clique relaxations (motivated by community
detection), or path problems. Mostly encountering hardness results, sometimes
even for two or three layers, we can also spot some islands of tractability
StructMatrix: large-scale visualization of graphs by means of structure detection and dense matrices
Given a large-scale graph with millions of nodes and edges, how to reveal
macro patterns of interest, like cliques, bi-partite cores, stars, and chains?
Furthermore, how to visualize such patterns altogether getting insights from
the graph to support wise decision-making? Although there are many algorithmic
and visual techniques to analyze graphs, none of the existing approaches is
able to present the structural information of graphs at large-scale. Hence,
this paper describes StructMatrix, a methodology aimed at high-scalable visual
inspection of graph structures with the goal of revealing macro patterns of
interest. StructMatrix combines algorithmic structure detection and adjacency
matrix visualization to present cardinality, distribution, and relationship
features of the structures found in a given graph. We performed experiments in
real, large-scale graphs with up to one million nodes and millions of edges.
StructMatrix revealed that graphs of high relevance (e.g., Web, Wikipedia and
DBLP) have characterizations that reflect the nature of their corresponding
domains; our findings have not been seen in the literature so far. We expect
that our technique will bring deeper insights into large graph mining,
leveraging their use for decision making.Comment: To appear: 8 pages, paper to be published at the Fifth IEEE ICDM
Workshop on Data Mining in Networks, 2015 as Hugo Gualdron, Robson Cordeiro,
Jose Rodrigues (2015) StructMatrix: Large-scale visualization of graphs by
means of structure detection and dense matrices In: The Fifth IEEE ICDM
Workshop on Data Mining in Networks 1--8, IEE
Intrinsically Dynamic Network Communities
Community finding algorithms for networks have recently been extended to
dynamic data. Most of these recent methods aim at exhibiting community
partitions from successive graph snapshots and thereafter connecting or
smoothing these partitions using clever time-dependent features and sampling
techniques. These approaches are nonetheless achieving longitudinal rather than
dynamic community detection. We assume that communities are fundamentally
defined by the repetition of interactions among a set of nodes over time.
According to this definition, analyzing the data by considering successive
snapshots induces a significant loss of information: we suggest that it blurs
essentially dynamic phenomena - such as communities based on repeated
inter-temporal interactions, nodes switching from a community to another across
time, or the possibility that a community survives while its members are being
integrally replaced over a longer time period. We propose a formalism which
aims at tackling this issue in the context of time-directed datasets (such as
citation networks), and present several illustrations on both empirical and
synthetic dynamic networks. We eventually introduce intrinsically dynamic
metrics to qualify temporal community structure and emphasize their possible
role as an estimator of the quality of the community detection - taking into
account the fact that various empirical contexts may call for distinct
`community' definitions and detection criteria.Comment: 27 pages, 11 figure
Finding the Hierarchy of Dense Subgraphs using Nucleus Decompositions
Finding dense substructures in a graph is a fundamental graph mining
operation, with applications in bioinformatics, social networks, and
visualization to name a few. Yet most standard formulations of this problem
(like clique, quasiclique, k-densest subgraph) are NP-hard. Furthermore, the
goal is rarely to find the "true optimum", but to identify many (if not all)
dense substructures, understand their distribution in the graph, and ideally
determine relationships among them. Current dense subgraph finding algorithms
usually optimize some objective, and only find a few such subgraphs without
providing any structural relations. We define the nucleus decomposition of a
graph, which represents the graph as a forest of nuclei. Each nucleus is a
subgraph where smaller cliques are present in many larger cliques. The forest
of nuclei is a hierarchy by containment, where the edge density increases as we
proceed towards leaf nuclei. Sibling nuclei can have limited intersections,
which enables discovering overlapping dense subgraphs. With the right
parameters, the nucleus decomposition generalizes the classic notions of
k-cores and k-truss decompositions. We give provably efficient algorithms for
nucleus decompositions, and empirically evaluate their behavior in a variety of
real graphs. The tree of nuclei consistently gives a global, hierarchical
snapshot of dense substructures, and outputs dense subgraphs of higher quality
than other state-of-the-art solutions. Our algorithm can process graphs with
tens of millions of edges in less than an hour
Algorithm-Level Optimizations for Scalable Parallel Graph Processing
Efficiently processing large graphs is challenging, since parallel graph algorithms suffer from
poor scalability and performance due to many factors, including heavy communication and load-imbalance.
Furthermore, it is difficult to express graph algorithms, as users need to understand
and effectively utilize the underlying execution of the algorithm on the distributed system. The
performance of graph algorithms depends not only on the characteristics of the system (such as
latency, available RAM, etc.), but also on the characteristics of the input graph (small-world scalefree,
mesh, long-diameter, etc.), and characteristics of the algorithm (sparse computation vs. dense
communication). The best execution strategy, therefore, often heavily depends on the combination
of input graph, system and algorithm.
Fine-grained expression exposes maximum parallelism in the algorithm and allows the user to
concentrate on a single vertex, making it easier to express parallel graph algorithms. However,
this often loses information about the machine, making it difficult to extract performance and
scalability from fine-grained algorithms.
To address these issues, we present a model for expressing parallel graph algorithms using a
fine-grained expression. Our model decouples the algorithm-writer from the underlying details
of the system, graph, and execution and tuning of the algorithm. We also present various graph
paradigms that optimize the execution of graph algorithms for various types of input graphs and
systems. We show our model is general enough to allow graph algorithms to use the various graph
paradigms for the best/fastest execution, and demonstrate good performance and scalability for
various different graphs, algorithms, and systems to 100,000+ cores
- …