33,290 research outputs found
Adaptive Partitioning for Large-Scale Dynamic Graphs
Abstract—In the last years, large-scale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notable way to achieve this goal is to partition the graph by minimizing the num-ber of edges that connect vertices assigned to different machines, while keeping the load balanced. However, real-world graphs are highly dynamic, with vertices and edges being constantly added and removed. Carefully updating the partitioning of the graph to reflect these changes is necessary to avoid the introduction of an extensive number of cut edges, which would gradually worsen computation performance. In this paper we show that performance degradation in dynamic graph processing systems can be avoided by adapting continuously the graph partitions as the graph changes. We present a novel highly scalable adaptive partitioning strategy, and show a number of refinements that make it work under the constraints of a large-scale distributed system. The partitioning strategy is based on iterative vertex migrations, relying only on local information. We have implemented the technique in a graph processing system, and we show through three real-world scenarios how adapting graph partitioning reduces execution time by over 50 % when compared to commonly used hash-partitioning. I
Community Detection via Semi-Synchronous Label Propagation Algorithms
A recently introduced novel community detection strategy is based on a label
propagation algorithm (LPA) which uses the diffusion of information in the
network to identify communities. Studies of LPAs showed that the strategy is
effective in finding a good community structure. Label propagation step can be
performed in parallel on all nodes (synchronous model) or sequentially
(asynchronous model); both models present some drawback, e.g., algorithm
termination is nor granted in the first case, performances can be worst in the
second case. In this paper, we present a semi-synchronous version of LPA which
aims to combine the advantages of both synchronous and asynchronous models. We
prove that our models always converge to a stable labeling. Moreover, we
experimentally investigate the effectiveness of the proposed strategy comparing
its performance with the asynchronous model both in terms of quality,
efficiency and stability. Tests show that the proposed protocol does not harm
the quality of the partitioning. Moreover it is quite efficient; each
propagation step is extremely parallelizable and it is more stable than the
asynchronous model, thanks to the fact that only a small amount of
randomization is used by our proposal.Comment: In Proc. of The International Workshop on Business Applications of
Social Network Analysis (BASNA '10
TAPER: query-aware, partition-enhancement for large, heterogenous, graphs
Graph partitioning has long been seen as a viable approach to address Graph
DBMS scalability. A partitioning, however, may introduce extra query processing
latency unless it is sensitive to a specific query workload, and optimised to
minimise inter-partition traversals for that workload. Additionally, it should
also be possible to incrementally adjust the partitioning in reaction to
changes in the graph topology, the query workload, or both. Because of their
complexity, current partitioning algorithms fall short of one or both of these
requirements, as they are designed for offline use and as one-off operations.
The TAPER system aims to address both requirements, whilst leveraging existing
partitioning algorithms. TAPER takes any given initial partitioning as a
starting point, and iteratively adjusts it by swapping chosen vertices across
partitions, heuristically reducing the probability of inter-partition
traversals for a given pattern matching queries workload. Iterations are
inexpensive thanks to time and space optimisations in the underlying support
data structures. We evaluate TAPER on two different large test graphs and over
realistic query workloads. Our results indicate that, given a hash-based
partitioning, TAPER reduces the number of inter-partition traversals by around
80%; given an unweighted METIS partitioning, by around 30%. These reductions
are achieved within 8 iterations and with the additional advantage of being
workload-aware and usable online.Comment: 12 pages, 11 figures, unpublishe
Bayesian clustering in decomposable graphs
In this paper we propose a class of prior distributions on decomposable
graphs, allowing for improved modeling flexibility. While existing methods
solely penalize the number of edges, the proposed work empowers practitioners
to control clustering, level of separation, and other features of the graph.
Emphasis is placed on a particular prior distribution which derives its
motivation from the class of product partition models; the properties of this
prior relative to existing priors is examined through theory and simulation. We
then demonstrate the use of graphical models in the field of agriculture,
showing how the proposed prior distribution alleviates the inflexibility of
previous approaches in properly modeling the interactions between the yield of
different crop varieties.Comment: 3 figures, 1 tabl
- …