7,397 research outputs found
Extraction and classification of dense communities in the Web
The World Wide Web (WWW) is rapidly becoming important for society as a medium for sharing data, information and services, and there is a growing interest in tools for understanding collective behaviors and emerging phenomena in the WWW. In this paper we focus on the problem of searching and classifying communities in the web. Loosely speaking a community is a group of pages related to a common interest. More formally communities have been associated in the computer science literature with the existence of a locally dense sub-graph of the web-graph (where web pages are nodes and hyper-links are arcs of the web-graph) The core of our contribution is a new scalable algorithm for finding relatively dense subgraphs in massive graphs. We apply our algorithm on web-graphs built on three publicly available large crawls of the web (with raw sizes up to 120M nodes and 1G arcs). The effectiveness of our algorithm in finding dense subgraphs is demonstrated experimentally by embedding artificial communities in the web-graph and counting how many of these are blindly found. Effectiveness increases with the size and density of the communities: it is close to 100% for dense communities of a hundred nodes or more. Moreover it is still about 80% even for small communities of twenty nodes and density at 50% of the arcs present. We complete our Community Watch system by clustering the communities found in the web-graph into homogeneous groups by topic and labelling each group by representative keywords
Core Decomposition in Multilayer Networks: Theory, Algorithms, and Applications
Multilayer networks are a powerful paradigm to model complex systems, where
multiple relations occur between the same entities. Despite the keen interest
in a variety of tasks, algorithms, and analyses in this type of network, the
problem of extracting dense subgraphs has remained largely unexplored so far.
In this work we study the problem of core decomposition of a multilayer
network. The multilayer context is much challenging as no total order exists
among multilayer cores; rather, they form a lattice whose size is exponential
in the number of layers. In this setting we devise three algorithms which
differ in the way they visit the core lattice and in their pruning techniques.
We then move a step forward and study the problem of extracting the
inner-most (also known as maximal) cores, i.e., the cores that are not
dominated by any other core in terms of their core index in all the layers.
Inner-most cores are typically orders of magnitude less than all the cores.
Motivated by this, we devise an algorithm that effectively exploits the
maximality property and extracts inner-most cores directly, without first
computing a complete decomposition.
Finally, we showcase the multilayer core-decomposition tool in a variety of
scenarios and problems. We start by considering the problem of densest-subgraph
extraction in multilayer networks. We introduce a definition of multilayer
densest subgraph that trades-off between high density and number of layers in
which the high density holds, and exploit multilayer core decomposition to
approximate this problem with quality guarantees. As further applications, we
show how to utilize multilayer core decomposition to speed-up the extraction of
frequent cross-graph quasi-cliques and to generalize the community-search
problem to the multilayer setting
Detecting High Log-Densities -- an O(n^1/4) Approximation for Densest k-Subgraph
In the Densest k-Subgraph problem, given a graph G and a parameter k, one
needs to find a subgraph of G induced on k vertices that contains the largest
number of edges. There is a significant gap between the best known upper and
lower bounds for this problem. It is NP-hard, and does not have a PTAS unless
NP has subexponential time algorithms. On the other hand, the current best
known algorithm of Feige, Kortsarz and Peleg, gives an approximation ratio of
n^(1/3-epsilon) for some specific epsilon > 0 (estimated at around 1/60).
We present an algorithm that for every epsilon > 0 approximates the Densest
k-Subgraph problem within a ratio of n^(1/4+epsilon) in time n^O(1/epsilon). In
particular, our algorithm achieves an approximation ratio of O(n^1/4) in time
n^O(log n). Our algorithm is inspired by studying an average-case version of
the problem where the goal is to distinguish random graphs from graphs with
planted dense subgraphs. The approximation ratio we achieve for the general
case matches the distinguishing ratio we obtain for this planted problem.
At a high level, our algorithms involve cleverly counting appropriately
defined trees of constant size in G, and using these counts to identify the
vertices of the dense subgraph. Our algorithm is based on the following
principle. We say that a graph G(V,E) has log-density alpha if its average
degree is Theta(|V|^alpha). The algorithmic core of our result is a family of
algorithms that output k-subgraphs of nontrivial density whenever the
log-density of the densest k-subgraph is larger than the log-density of the
host graph.Comment: 23 page
- …