43,688 research outputs found
Sequential Changepoint Approach for Online Community Detection
We present new algorithms for detecting the emergence of a community in large
networks from sequential observations. The networks are modeled using
Erdos-Renyi random graphs with edges forming between nodes in the community
with higher probability. Based on statistical changepoint detection
methodology, we develop three algorithms: the Exhaustive Search (ES), the
mixture, and the Hierarchical Mixture (H-Mix) methods. Performance of these
methods is evaluated by the average run length (ARL), which captures the
frequency of false alarms, and the detection delay. Numerical comparisons show
that the ES method performs the best; however, it is exponentially complex. The
mixture method is polynomially complex by exploiting the fact that the size of
the community is typically small in a large network. However, it may react to a
group of active edges that do not form a community. This issue is resolved by
the H-Mix method, which is based on a dendrogram decomposition of the network.
We present an asymptotic analytical expression for ARL of the mixture method
when the threshold is large. Numerical simulation verifies that our
approximation is accurate even in the non-asymptotic regime. Hence, it can be
used to determine a desired threshold efficiently. Finally, numerical examples
show that the mixture and the H-Mix methods can both detect a community quickly
with a lower complexity than the ES method.Comment: Submitted to 2014 INFORMS Workshop on Data Mining and Analytics and
an IEEE journa
On combinatorial optimisation in analysis of protein-protein interaction and protein folding networks
Abstract: Protein-protein interaction networks and protein folding networks represent prominent research topics at the intersection of bioinformatics and network science. In this paper, we present a study of these networks from combinatorial optimisation point of view. Using a combination of classical heuristics and stochastic optimisation techniques, we were able to identify several interesting combinatorial properties of biological networks of the COSIN project. We obtained optimal or near-optimal solutions to maximum clique and chromatic number problems for these networks. We also explore patterns of both non-overlapping and overlapping cliques in these networks. Optimal or near-optimal solutions to partitioning of these networks into non-overlapping cliques and to maximum independent set problem were discovered. Maximal cliques are explored by enumerative techniques. Domination in these networks is briefly studied, too. Applications and extensions of our findings are discussed
Additive Approximation Algorithms for Modularity Maximization
The modularity is a quality function in community detection, which was
introduced by Newman and Girvan (2004). Community detection in graphs is now
often conducted through modularity maximization: given an undirected graph
, we are asked to find a partition of that maximizes
the modularity. Although numerous algorithms have been developed to date, most
of them have no theoretical approximation guarantee. Recently, to overcome this
issue, the design of modularity maximization algorithms with provable
approximation guarantees has attracted significant attention in the computer
science community.
In this study, we further investigate the approximability of modularity
maximization. More specifically, we propose a polynomial-time
-additive approximation algorithm for the
modularity maximization problem. Note here that
holds. This improves the current best additive approximation error of ,
which was recently provided by Dinh, Li, and Thai (2015). Interestingly, our
analysis also demonstrates that the proposed algorithm obtains a nearly-optimal
solution for any instance with a very high modularity value. Moreover, we
propose a polynomial-time -additive approximation algorithm for the
maximum modularity cut problem. It should be noted that this is the first
non-trivial approximability result for the problem. Finally, we demonstrate
that our approximation algorithm can be extended to some related problems.Comment: 23 pages, 4 figure
Fast Shortest Path Distance Estimation in Large Networks
We study the problem of preprocessing a large graph so that point-to-point shortest-path queries can be answered very fast. Computing shortest paths is a well studied problem, but exact algorithms do not scale to huge graphs encountered on the web, social networks, and other applications.
In this paper we focus on approximate methods for distance estimation, in particular using landmark-based distance indexing. This approach involves selecting a subset of nodes as landmarks and computing (offline) the distances from each node in the graph to those landmarks. At runtime, when the distance between a pair of nodes is needed, we can estimate it quickly by combining the precomputed distances of the two nodes to the landmarks.
We prove that selecting the optimal set of landmarks is an NP-hard problem, and thus heuristic solutions need to be employed. Given a budget of memory for the index, which translates directly into a budget of landmarks, different landmark selection strategies can yield dramatically different results in terms of accuracy. A number of simple methods that scale well to large graphs are therefore developed and experimentally compared. The simplest methods choose central nodes of the graph, while the more elaborate ones select central nodes that are also far away from one another. The efficiency of the suggested techniques is tested experimentally using five different real world graphs with millions of edges; for a given accuracy, they require as much as 250 times less space than the current approach in the literature which considers selecting landmarks at random.
Finally, we study applications of our method in two problems arising naturally in large-scale networks, namely, social search and community detection.Yahoo! Research (internship
Approximate Closest Community Search in Networks
Recently, there has been significant interest in the study of the community
search problem in social and information networks: given one or more query
nodes, find densely connected communities containing the query nodes. However,
most existing studies do not address the "free rider" issue, that is, nodes far
away from query nodes and irrelevant to them are included in the detected
community. Some state-of-the-art models have attempted to address this issue,
but not only are their formulated problems NP-hard, they do not admit any
approximations without restrictive assumptions, which may not always hold in
practice.
In this paper, given an undirected graph G and a set of query nodes Q, we
study community search using the k-truss based community model. We formulate
our problem of finding a closest truss community (CTC), as finding a connected
k-truss subgraph with the largest k that contains Q, and has the minimum
diameter among such subgraphs. We prove this problem is NP-hard. Furthermore,
it is NP-hard to approximate the problem within a factor , for
any . However, we develop a greedy algorithmic framework,
which first finds a CTC containing Q, and then iteratively removes the furthest
nodes from Q, from the graph. The method achieves 2-approximation to the
optimal solution. To further improve the efficiency, we make use of a compact
truss index and develop efficient algorithms for k-truss identification and
maintenance as nodes get eliminated. In addition, using bulk deletion
optimization and local exploration strategies, we propose two more efficient
algorithms. One of them trades some approximation quality for efficiency while
the other is a very efficient heuristic. Extensive experiments on 6 real-world
networks show the effectiveness and efficiency of our community model and
search algorithms
Spectral Graph Forge: Graph Generation Targeting Modularity
Community structure is an important property that captures inhomogeneities
common in large networks, and modularity is one of the most widely used metrics
for such community structure. In this paper, we introduce a principled
methodology, the Spectral Graph Forge, for generating random graphs that
preserves community structure from a real network of interest, in terms of
modularity. Our approach leverages the fact that the spectral structure of
matrix representations of a graph encodes global information about community
structure. The Spectral Graph Forge uses a low-rank approximation of the
modularity matrix to generate synthetic graphs that match a target modularity
within user-selectable degree of accuracy, while allowing other aspects of
structure to vary. We show that the Spectral Graph Forge outperforms
state-of-the-art techniques in terms of accuracy in targeting the modularity
and randomness of the realizations, while also preserving other local
structural properties and node attributes. We discuss extensions of the
Spectral Graph Forge to target other properties beyond modularity, and its
applications to anonymization
- β¦