840 research outputs found
Enumerating Top-k Quasi-Cliques
Quasi-cliques are dense incomplete subgraphs of a graph that generalize the
notion of cliques. Enumerating quasi-cliques from a graph is a robust way to
detect densely connected structures with applications to bio-informatics and
social network analysis. However, enumerating quasi-cliques in a graph is a
challenging problem, even harder than the problem of enumerating cliques. We
consider the enumeration of top-k degree-based quasi-cliques, and make the
following contributions: (1) We show that even the problem of detecting if a
given quasi-clique is maximal (i.e. not contained within another quasi-clique)
is NP-hard (2) We present a novel heuristic algorithm KernelQC to enumerate the
k largest quasi-cliques in a graph. Our method is based on identifying kernels
of extremely dense subgraphs within a graph, following by growing subgraphs
around these kernels, to arrive at quasi-cliques with the required densities
(3) Experimental results show that our algorithm accurately enumerates
quasi-cliques from a graph, is much faster than current state-of-the-art
methods for quasi-clique enumeration (often more than three orders of magnitude
faster), and can scale to larger graphs than current methods.Comment: 10 page
Core Decomposition in Multilayer Networks: Theory, Algorithms, and Applications
Multilayer networks are a powerful paradigm to model complex systems, where
multiple relations occur between the same entities. Despite the keen interest
in a variety of tasks, algorithms, and analyses in this type of network, the
problem of extracting dense subgraphs has remained largely unexplored so far.
In this work we study the problem of core decomposition of a multilayer
network. The multilayer context is much challenging as no total order exists
among multilayer cores; rather, they form a lattice whose size is exponential
in the number of layers. In this setting we devise three algorithms which
differ in the way they visit the core lattice and in their pruning techniques.
We then move a step forward and study the problem of extracting the
inner-most (also known as maximal) cores, i.e., the cores that are not
dominated by any other core in terms of their core index in all the layers.
Inner-most cores are typically orders of magnitude less than all the cores.
Motivated by this, we devise an algorithm that effectively exploits the
maximality property and extracts inner-most cores directly, without first
computing a complete decomposition.
Finally, we showcase the multilayer core-decomposition tool in a variety of
scenarios and problems. We start by considering the problem of densest-subgraph
extraction in multilayer networks. We introduce a definition of multilayer
densest subgraph that trades-off between high density and number of layers in
which the high density holds, and exploit multilayer core decomposition to
approximate this problem with quality guarantees. As further applications, we
show how to utilize multilayer core decomposition to speed-up the extraction of
frequent cross-graph quasi-cliques and to generalize the community-search
problem to the multilayer setting
Enumerating Maximal Bicliques from a Large Graph using MapReduce
We consider the enumeration of maximal bipartite cliques (bicliques) from a
large graph, a task central to many practical data mining problems in social
network analysis and bioinformatics. We present novel parallel algorithms for
the MapReduce platform, and an experimental evaluation using Hadoop MapReduce.
Our algorithm is based on clustering the input graph into smaller sized
subgraphs, followed by processing different subgraphs in parallel. Our
algorithm uses two ideas that enable it to scale to large graphs: (1) the
redundancy in work between different subgraph explorations is minimized through
a careful pruning of the search space, and (2) the load on different reducers
is balanced through the use of an appropriate total order among the vertices.
Our evaluation shows that the algorithm scales to large graphs with millions of
edges and tens of mil- lions of maximal bicliques. To our knowledge, this is
the first work on maximal biclique enumeration for graphs of this scale.Comment: A preliminary version of the paper was accepted at the Proceedings of
the 3rd IEEE International Congress on Big Data 201
The most persistent soft-clique in a set of sampled graphs
When searching for characteristic subpatterns in potentially noisy graph data, it appears self-evident that having multiple observations would be better than having just one. However, it turns out that the inconsistencies introduced when different graph instances have different edge sets pose a serious challenge. In this work we address this challenge for the problem of finding maximum weighted cliques. We introduce the concept of most persistent soft-clique. This is subset of vertices, that 1) is almost fully or at least densely connected, 2) occurs in all or almost all graph instances, and 3) has the maximum weight. We present a measure of clique-ness, that essentially counts the number of edge missing to make a subset of vertices into a clique. With this measure, we show that the problem of finding the most persistent soft-clique problem can be cast either as: a) a max-min two person game optimization problem, or b) a min-min soft margin optimization problem. Both formulations lead to the same solution when using a partial Lagrangian method to solve the optimization problems. By experiments on synthetic data and on real social network data, we show that the proposed method is able to reliably find soft cliques in graph data, even if that is distorted by random noise or unreliable observations
Truss Decomposition in Massive Networks
The k-truss is a type of cohesive subgraphs proposed recently for the study
of networks. While the problem of computing most cohesive subgraphs is NP-hard,
there exists a polynomial time algorithm for computing k-truss. Compared with
k-core which is also efficient to compute, k-truss represents the "core" of a
k-core that keeps the key information of, while filtering out less important
information from, the k-core. However, existing algorithms for computing
k-truss are inefficient for handling today's massive networks. We first improve
the existing in-memory algorithm for computing k-truss in networks of moderate
size. Then, we propose two I/O-efficient algorithms to handle massive networks
that cannot fit in main memory. Our experiments on real datasets verify the
efficiency of our algorithms and the value of k-truss.Comment: VLDB201
- …