9,699 research outputs found
Distributed Triangle Counting in the Graphulo Matrix Math Library
Triangle counting is a key algorithm for large graph analysis. The Graphulo
library provides a framework for implementing graph algorithms on the Apache
Accumulo distributed database. In this work we adapt two algorithms for
counting triangles, one that uses the adjacency matrix and another that also
uses the incidence matrix, to the Graphulo library for server-side processing
inside Accumulo. Cloud-based experiments show a similar performance profile for
these different approaches on the family of power law Graph500 graphs, for
which data skew increasingly bottlenecks. These results motivate the design of
skew-aware hybrid algorithms that we propose for future work.Comment: Honorable mention in the 2017 IEEE HPEC's Graph Challeng
Algorithmic Complexity of Power Law Networks
It was experimentally observed that the majority of real-world networks
follow power law degree distribution. The aim of this paper is to study the
algorithmic complexity of such "typical" networks. The contribution of this
work is twofold.
First, we define a deterministic condition for checking whether a graph has a
power law degree distribution and experimentally validate it on real-world
networks. This definition allows us to derive interesting properties of power
law networks. We observe that for exponents of the degree distribution in the
range such networks exhibit double power law phenomenon that was
observed for several real-world networks. Our observation indicates that this
phenomenon could be explained by just pure graph theoretical properties.
The second aim of our work is to give a novel theoretical explanation why
many algorithms run faster on real-world data than what is predicted by
algorithmic worst-case analysis. We show how to exploit the power law degree
distribution to design faster algorithms for a number of classical P-time
problems including transitive closure, maximum matching, determinant, PageRank
and matrix inverse. Moreover, we deal with the problems of counting triangles
and finding maximum clique. Previously, it has been only shown that these
problems can be solved very efficiently on power law graphs when these graphs
are random, e.g., drawn at random from some distribution. However, it is
unclear how to relate such a theoretical analysis to real-world graphs, which
are fixed. Instead of that, we show that the randomness assumption can be
replaced with a simple condition on the degrees of adjacent vertices, which can
be used to obtain similar results. As a result, in some range of power law
exponents, we are able to solve the maximum clique problem in polynomial time,
although in general power law networks the problem is NP-complete
Performance bounds for expander-based compressed sensing in Poisson noise
This paper provides performance bounds for compressed sensing in the presence
of Poisson noise using expander graphs. The Poisson noise model is appropriate
for a variety of applications, including low-light imaging and digital
streaming, where the signal-independent and/or bounded noise models used in the
compressed sensing literature are no longer applicable. In this paper, we
develop a novel sensing paradigm based on expander graphs and propose a MAP
algorithm for recovering sparse or compressible signals from Poisson
observations. The geometry of the expander graphs and the positivity of the
corresponding sensing matrices play a crucial role in establishing the bounds
on the signal reconstruction error of the proposed algorithm. We support our
results with experimental demonstrations of reconstructing average packet
arrival rates and instantaneous packet counts at a router in a communication
network, where the arrivals of packets in each flow follow a Poisson process.Comment: revised version; accepted to IEEE Transactions on Signal Processin
Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning
The number of triangles is a computationally expensive graph statistic which
is frequently used in complex network analysis (e.g., transitivity ratio), in
various random graph models (e.g., exponential random graph model) and in
important real world applications such as spam detection, uncovering of the
hidden thematic structure of the Web and link recommendation. Counting
triangles in graphs with millions and billions of edges requires algorithms
which run fast, use small amount of space, provide accurate estimates of the
number of triangles and preferably are parallelizable.
In this paper we present an efficient triangle counting algorithm which can
be adapted to the semistreaming model. The key idea of our algorithm is to
combine the sampling algorithm of Tsourakakis et al. and the partitioning of
the set of vertices into a high degree and a low degree subset respectively as
in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a
running time
and an approximation (multiplicative error), where is the number
of vertices, the number of edges and the maximum number of
triangles an edge is contained.
Furthermore, we show how this algorithm can be adapted to the semistreaming
model with space usage and a constant number of passes (three) over the graph
stream. We apply our methods in various networks with several millions of edges
and we obtain excellent results. Finally, we propose a random projection based
method for triangle counting and provide a sufficient condition to obtain an
estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models
for the Web Graph (WAW 2010
- …