50,528 research outputs found
Decoding communities in networks
According to a recent information-theoretical proposal, the problem of
defining and identifying communities in networks can be interpreted as a
classical communication task over a noisy channel: memberships of nodes are
information bits erased by the channel, edges and non-edges in the network are
parity bits introduced by the encoder but degraded through the channel, and a
community identification algorithm is a decoder. The interpretation is
perfectly equivalent to the one at the basis of well-known statistical
inference algorithms for community detection. The only difference in the
interpretation is that a noisy channel replaces a stochastic network model.
However, the different perspective gives the opportunity to take advantage of
the rich set of tools of coding theory to generate novel insights on the
problem of community detection. In this paper, we illustrate two main
applications of standard coding-theoretical methods to community detection.
First, we leverage a state-of-the-art decoding technique to generate a family
of quasi-optimal community detection algorithms. Second and more important, we
show that the Shannon's noisy-channel coding theorem can be invoked to
establish a lower bound, here named as decodability bound, for the maximum
amount of noise tolerable by an ideal decoder to achieve perfect detection of
communities. When computed for well-established synthetic benchmarks, the
decodability bound explains accurately the performance achieved by the best
community detection algorithms existing on the market, telling us that only
little room for their improvement is still potentially left.Comment: 9 pages, 5 figures + Appendi
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
Streaming Graph Challenge: Stochastic Block Partition
An important objective for analyzing real-world graphs is to achieve scalable
performance on large, streaming graphs. A challenging and relevant example is
the graph partition problem. As a combinatorial problem, graph partition is
NP-hard, but existing relaxation methods provide reasonable approximate
solutions that can be scaled for large graphs. Competitive benchmarks and
challenges have proven to be an effective means to advance state-of-the-art
performance and foster community collaboration. This paper describes a graph
partition challenge with a baseline partition algorithm of sub-quadratic
complexity. The algorithm employs rigorous Bayesian inferential methods based
on a statistical model that captures characteristics of the real-world graphs.
This strong foundation enables the algorithm to address limitations of
well-known graph partition approaches such as modularity maximization. This
paper describes various aspects of the challenge including: (1) the data sets
and streaming graph generator, (2) the baseline partition algorithm with
pseudocode, (3) an argument for the correctness of parallelizing the Bayesian
inference, (4) different parallel computation strategies such as node-based
parallelism and matrix-based parallelism, (5) evaluation metrics for partition
correctness and computational requirements, (6) preliminary timing of a
Python-based demonstration code and the open source C++ code, and (7)
considerations for partitioning the graph in streaming fashion. Data sets and
source code for the algorithm as well as metrics, with detailed documentation
are available at GraphChallenge.org.Comment: To be published in 2017 IEEE High Performance Extreme Computing
Conference (HPEC
A Multi-view Context-aware Approach to Android Malware Detection and Malicious Code Localization
Existing Android malware detection approaches use a variety of features such
as security sensitive APIs, system calls, control-flow structures and
information flows in conjunction with Machine Learning classifiers to achieve
accurate detection. Each of these feature sets provides a unique semantic
perspective (or view) of apps' behaviours with inherent strengths and
limitations. Meaning, some views are more amenable to detect certain attacks
but may not be suitable to characterise several other attacks. Most of the
existing malware detection approaches use only one (or a selected few) of the
aforementioned feature sets which prevent them from detecting a vast majority
of attacks. Addressing this limitation, we propose MKLDroid, a unified
framework that systematically integrates multiple views of apps for performing
comprehensive malware detection and malicious code localisation. The rationale
is that, while a malware app can disguise itself in some views, disguising in
every view while maintaining malicious intent will be much harder.
MKLDroid uses a graph kernel to capture structural and contextual information
from apps' dependency graphs and identify malice code patterns in each view.
Subsequently, it employs Multiple Kernel Learning (MKL) to find a weighted
combination of the views which yields the best detection accuracy. Besides
multi-view learning, MKLDroid's unique and salient trait is its ability to
locate fine-grained malice code portions in dependency graphs (e.g.,
methods/classes). Through our large-scale experiments on several datasets
(incl. wild apps), we demonstrate that MKLDroid outperforms three
state-of-the-art techniques consistently, in terms of accuracy while
maintaining comparable efficiency. In our malicious code localisation
experiments on a dataset of repackaged malware, MKLDroid was able to identify
all the malice classes with 94% average recall
- …