1,613 research outputs found
Enumerating Maximal Bicliques from a Large Graph using MapReduce
We consider the enumeration of maximal bipartite cliques (bicliques) from a
large graph, a task central to many practical data mining problems in social
network analysis and bioinformatics. We present novel parallel algorithms for
the MapReduce platform, and an experimental evaluation using Hadoop MapReduce.
Our algorithm is based on clustering the input graph into smaller sized
subgraphs, followed by processing different subgraphs in parallel. Our
algorithm uses two ideas that enable it to scale to large graphs: (1) the
redundancy in work between different subgraph explorations is minimized through
a careful pruning of the search space, and (2) the load on different reducers
is balanced through the use of an appropriate total order among the vertices.
Our evaluation shows that the algorithm scales to large graphs with millions of
edges and tens of mil- lions of maximal bicliques. To our knowledge, this is
the first work on maximal biclique enumeration for graphs of this scale.Comment: A preliminary version of the paper was accepted at the Proceedings of
the 3rd IEEE International Congress on Big Data 201
Enumerating Top-k Quasi-Cliques
Quasi-cliques are dense incomplete subgraphs of a graph that generalize the
notion of cliques. Enumerating quasi-cliques from a graph is a robust way to
detect densely connected structures with applications to bio-informatics and
social network analysis. However, enumerating quasi-cliques in a graph is a
challenging problem, even harder than the problem of enumerating cliques. We
consider the enumeration of top-k degree-based quasi-cliques, and make the
following contributions: (1) We show that even the problem of detecting if a
given quasi-clique is maximal (i.e. not contained within another quasi-clique)
is NP-hard (2) We present a novel heuristic algorithm KernelQC to enumerate the
k largest quasi-cliques in a graph. Our method is based on identifying kernels
of extremely dense subgraphs within a graph, following by growing subgraphs
around these kernels, to arrive at quasi-cliques with the required densities
(3) Experimental results show that our algorithm accurately enumerates
quasi-cliques from a graph, is much faster than current state-of-the-art
methods for quasi-clique enumeration (often more than three orders of magnitude
faster), and can scale to larger graphs than current methods.Comment: 10 page
Where Graph Topology Matters: The Robust Subgraph Problem
Robustness is a critical measure of the resilience of large networked
systems, such as transportation and communication networks. Most prior works
focus on the global robustness of a given graph at large, e.g., by measuring
its overall vulnerability to external attacks or random failures. In this
paper, we turn attention to local robustness and pose a novel problem in the
lines of subgraph mining: given a large graph, how can we find its most robust
local subgraph (RLS)?
We define a robust subgraph as a subset of nodes with high communicability
among them, and formulate the RLS-PROBLEM of finding a subgraph of given size
with maximum robustness in the host graph. Our formulation is related to the
recently proposed general framework for the densest subgraph problem, however
differs from it substantially in that besides the number of edges in the
subgraph, robustness also concerns with the placement of edges, i.e., the
subgraph topology. We show that the RLS-PROBLEM is NP-hard and propose two
heuristic algorithms based on top-down and bottom-up search strategies.
Further, we present modifications of our algorithms to handle three practical
variants of the RLS-PROBLEM. Experiments on synthetic and real-world graphs
demonstrate that we find subgraphs with larger robustness than the densest
subgraphs even at lower densities, suggesting that the existing approaches are
not suitable for the new problem setting.Comment: 13 pages, 10 Figures, 3 Tables, to appear at SDM 2015 (9 pages only
Core Decomposition in Multilayer Networks: Theory, Algorithms, and Applications
Multilayer networks are a powerful paradigm to model complex systems, where
multiple relations occur between the same entities. Despite the keen interest
in a variety of tasks, algorithms, and analyses in this type of network, the
problem of extracting dense subgraphs has remained largely unexplored so far.
In this work we study the problem of core decomposition of a multilayer
network. The multilayer context is much challenging as no total order exists
among multilayer cores; rather, they form a lattice whose size is exponential
in the number of layers. In this setting we devise three algorithms which
differ in the way they visit the core lattice and in their pruning techniques.
We then move a step forward and study the problem of extracting the
inner-most (also known as maximal) cores, i.e., the cores that are not
dominated by any other core in terms of their core index in all the layers.
Inner-most cores are typically orders of magnitude less than all the cores.
Motivated by this, we devise an algorithm that effectively exploits the
maximality property and extracts inner-most cores directly, without first
computing a complete decomposition.
Finally, we showcase the multilayer core-decomposition tool in a variety of
scenarios and problems. We start by considering the problem of densest-subgraph
extraction in multilayer networks. We introduce a definition of multilayer
densest subgraph that trades-off between high density and number of layers in
which the high density holds, and exploit multilayer core decomposition to
approximate this problem with quality guarantees. As further applications, we
show how to utilize multilayer core decomposition to speed-up the extraction of
frequent cross-graph quasi-cliques and to generalize the community-search
problem to the multilayer setting
Shared-memory Graph Truss Decomposition
We present PKT, a new shared-memory parallel algorithm and OpenMP
implementation for the truss decomposition of large sparse graphs. A k-truss is
a dense subgraph definition that can be considered a relaxation of a clique.
Truss decomposition refers to a partitioning of all the edges in the graph
based on their k-truss membership. The truss decomposition of a graph has many
applications. We show that our new approach PKT consistently outperforms other
truss decomposition approaches for a collection of large sparse graphs and on a
24-core shared-memory server. PKT is based on a recently proposed algorithm for
k-core decomposition.Comment: 10 pages, conference submissio
On a registration-based approach to sensor network localization
We consider a registration-based approach for localizing sensor networks from
range measurements. This is based on the assumption that one can find
overlapping cliques spanning the network. That is, for each sensor, one can
identify geometric neighbors for which all inter-sensor ranges are known. Such
cliques can be efficiently localized using multidimensional scaling. However,
since each clique is localized in some local coordinate system, we are required
to register them in a global coordinate system. In other words, our approach is
based on transforming the localization problem into a problem of registration.
In this context, the main contributions are as follows. First, we describe an
efficient method for partitioning the network into overlapping cliques. Second,
we study the problem of registering the localized cliques, and formulate a
necessary rigidity condition for uniquely recovering the global sensor
coordinates. In particular, we present a method for efficiently testing
rigidity, and a proposal for augmenting the partitioned network to enforce
rigidity. A recently proposed semidefinite relaxation of global registration is
used for registering the cliques. We present simulation results on random and
structured sensor networks to demonstrate that the proposed method compares
favourably with state-of-the-art methods in terms of run-time, accuracy, and
scalability
- …