835 research outputs found
Approximate Closest Community Search in Networks
Recently, there has been significant interest in the study of the community
search problem in social and information networks: given one or more query
nodes, find densely connected communities containing the query nodes. However,
most existing studies do not address the "free rider" issue, that is, nodes far
away from query nodes and irrelevant to them are included in the detected
community. Some state-of-the-art models have attempted to address this issue,
but not only are their formulated problems NP-hard, they do not admit any
approximations without restrictive assumptions, which may not always hold in
practice.
In this paper, given an undirected graph G and a set of query nodes Q, we
study community search using the k-truss based community model. We formulate
our problem of finding a closest truss community (CTC), as finding a connected
k-truss subgraph with the largest k that contains Q, and has the minimum
diameter among such subgraphs. We prove this problem is NP-hard. Furthermore,
it is NP-hard to approximate the problem within a factor , for
any . However, we develop a greedy algorithmic framework,
which first finds a CTC containing Q, and then iteratively removes the furthest
nodes from Q, from the graph. The method achieves 2-approximation to the
optimal solution. To further improve the efficiency, we make use of a compact
truss index and develop efficient algorithms for k-truss identification and
maintenance as nodes get eliminated. In addition, using bulk deletion
optimization and local exploration strategies, we propose two more efficient
algorithms. One of them trades some approximation quality for efficiency while
the other is a very efficient heuristic. Extensive experiments on 6 real-world
networks show the effectiveness and efficiency of our community model and
search algorithms
D4M 3.0: Extended Database and Language Capabilities
The D4M tool was developed to address many of today's data needs. This tool
is used by hundreds of researchers to perform complex analytics on unstructured
data. Over the past few years, the D4M toolbox has evolved to support
connectivity with a variety of new database engines, including SciDB.
D4M-Graphulo provides the ability to do graph analytics in the Apache Accumulo
database. Finally, an implementation using the Julia programming language is
also now available. In this article, we describe some of our latest additions
to the D4M toolbox and our upcoming D4M 3.0 release. We show through
benchmarking and scaling results that we can achieve fast SciDB ingest using
the D4M-SciDB connector, that using Graphulo can enable graph algorithms on
scales that can be memory limited, and that the Julia implementation of D4M
achieves comparable performance or exceeds that of the existing MATLAB(R)
implementation.Comment: IEEE HPEC 201
Exploring Communities in Large Profiled Graphs
Given a graph and a vertex , the community search (CS) problem
aims to efficiently find a subgraph of whose vertices are closely related
to . Communities are prevalent in social and biological networks, and can be
used in product advertisement and social event recommendation. In this paper,
we study profiled community search (PCS), where CS is performed on a profiled
graph. This is a graph in which each vertex has labels arranged in a
hierarchical manner. Extensive experiments show that PCS can identify
communities with themes that are common to their vertices, and is more
effective than existing CS approaches. As a naive solution for PCS is highly
expensive, we have also developed a tree index, which facilitate efficient and
online solutions for PCS
Top-L Most Influential Community Detection Over Social Networks (Technical Report)
In many real-world applications such as social network analysis and online
marketing/advertising, the community detection is a fundamental task to
identify communities (subgraphs) in social networks with high structural
cohesiveness. While previous works focus on detecting communities alone, they
do not consider the collective influences of users in these communities on
other user nodes in social networks. Inspired by this, in this paper, we
investigate the influence propagation from some seed communities and their
influential effects that result in the influenced communities. We propose a
novel problem, named Top-L most Influential Community DEtection (TopL-ICDE)
over social networks, which aims to retrieve top-L seed communities with the
highest influences, having high structural cohesiveness, and containing
user-specified query keywords. In order to efficiently tackle the TopL-ICDE
problem, we design effective pruning strategies to filter out false alarms of
seed communities and propose an effective index mechanism to facilitate
efficient Top-L community retrieval. We develop an efficient TopL-ICDE
answering algorithm by traversing the index and applying our proposed pruning
strategies. We also formulate and tackle a variant of TopL-ICDE, named
diversified top-L most influential community detection (DTopL-ICDE), which
returns a set of L diversified communities with the highest diversity score
(i.e., collaborative influences by L communities). We prove that DTopL-ICDE is
NP-hard, and propose an efficient greedy algorithm with our designed diversity
score pruning. Through extensive experiments, we verify the efficiency and
effectiveness of our proposed TopL-ICDE and DTopL-ICDE approaches over
real/synthetic social networks under various parameter settings
- …