285 research outputs found
D4M 3.0: Extended Database and Language Capabilities
The D4M tool was developed to address many of today's data needs. This tool
is used by hundreds of researchers to perform complex analytics on unstructured
data. Over the past few years, the D4M toolbox has evolved to support
connectivity with a variety of new database engines, including SciDB.
D4M-Graphulo provides the ability to do graph analytics in the Apache Accumulo
database. Finally, an implementation using the Julia programming language is
also now available. In this article, we describe some of our latest additions
to the D4M toolbox and our upcoming D4M 3.0 release. We show through
benchmarking and scaling results that we can achieve fast SciDB ingest using
the D4M-SciDB connector, that using Graphulo can enable graph algorithms on
scales that can be memory limited, and that the Julia implementation of D4M
achieves comparable performance or exceeds that of the existing MATLAB(R)
implementation.Comment: IEEE HPEC 201
Efficient Truss Maintenance in Evolving Networks
Truss was proposed to study social network data represented by graphs. A
k-truss of a graph is a cohesive subgraph, in which each edge is contained in
at least k-2 triangles within the subgraph. While truss has been demonstrated
as superior to model the close relationship in social networks and efficient
algorithms for finding trusses have been extensively studied, very little
attention has been paid to truss maintenance. However, most social networks are
evolving networks. It may be infeasible to recompute trusses from scratch from
time to time in order to find the up-to-date -trusses in the evolving
networks. In this paper, we discuss how to maintain trusses in a graph with
dynamic updates. We first discuss a set of properties on maintaining trusses,
then propose algorithms on maintaining trusses on edge deletions and
insertions, finally, we discuss truss index maintenance. We test the proposed
techniques on real datasets. The experiment results show the promise of our
work
Querying cohesive subgraphs by keywords
© 2018 IEEE. Keyword search problem has been widely studied to retrieve related substructures from graphs for a keyword set. However, existing well-studied approaches aim at finding compact trees/subgraphs containing the keywords, and ignore a critical measure, density, to reflect how strongly and stablely the keyword nodes are connected in the substructure. In this paper, we study the problem of finding a cohesive subgraph containing the query keywords based on the k-Truss model, and formulate it as minimal dense truss search problem, i.e., finding minimal subgraph with maximum trussness covering the keywords. We first propose an efficient algorithm to find the dense truss with the maximum trussness containing keywords based on a novel hybrid KT-Index (Keyword-Truss Index). Then, we develop a novel refinement approach to extract the minimal dense truss based on the anti-monotonicity property of k-Truss. Experimental studies on real datasets show the outperformance of our method
Approximate Closest Community Search in Networks
Recently, there has been significant interest in the study of the community
search problem in social and information networks: given one or more query
nodes, find densely connected communities containing the query nodes. However,
most existing studies do not address the "free rider" issue, that is, nodes far
away from query nodes and irrelevant to them are included in the detected
community. Some state-of-the-art models have attempted to address this issue,
but not only are their formulated problems NP-hard, they do not admit any
approximations without restrictive assumptions, which may not always hold in
practice.
In this paper, given an undirected graph G and a set of query nodes Q, we
study community search using the k-truss based community model. We formulate
our problem of finding a closest truss community (CTC), as finding a connected
k-truss subgraph with the largest k that contains Q, and has the minimum
diameter among such subgraphs. We prove this problem is NP-hard. Furthermore,
it is NP-hard to approximate the problem within a factor , for
any . However, we develop a greedy algorithmic framework,
which first finds a CTC containing Q, and then iteratively removes the furthest
nodes from Q, from the graph. The method achieves 2-approximation to the
optimal solution. To further improve the efficiency, we make use of a compact
truss index and develop efficient algorithms for k-truss identification and
maintenance as nodes get eliminated. In addition, using bulk deletion
optimization and local exploration strategies, we propose two more efficient
algorithms. One of them trades some approximation quality for efficiency while
the other is a very efficient heuristic. Extensive experiments on 6 real-world
networks show the effectiveness and efficiency of our community model and
search algorithms
- …