7,842 research outputs found
Dynamic Balanced Graph Partitioning
This paper initiates the study of the classic balanced graph partitioning
problem from an online perspective: Given an arbitrary sequence of pairwise
communication requests between nodes, with patterns that may change over
time, the objective is to service these requests efficiently by partitioning
the nodes into clusters, each of size , such that frequently
communicating nodes are located in the same cluster. The partitioning can be
updated dynamically by migrating nodes between clusters. The goal is to devise
online algorithms which jointly minimize the amount of inter-cluster
communication and migration cost.
The problem features interesting connections to other well-known online
problems. For example, scenarios with generalize online paging, and
scenarios with constitute a novel online variant of maximum matching. We
present several lower bounds and algorithms for settings both with and without
cluster-size augmentation. In particular, we prove that any deterministic
online algorithm has a competitive ratio of at least , even with significant
augmentation. Our main algorithmic contributions are an -competitive deterministic algorithm for the general setting with
constant augmentation, and a constant competitive algorithm for the maximum
matching variant
Window-based Streaming Graph Partitioning Algorithm
In the recent years, the scale of graph datasets has increased to such a
degree that a single machine is not capable of efficiently processing large
graphs. Thereby, efficient graph partitioning is necessary for those large
graph applications. Traditional graph partitioning generally loads the whole
graph data into the memory before performing partitioning; this is not only a
time consuming task but it also creates memory bottlenecks. These issues of
memory limitation and enormous time complexity can be resolved using
stream-based graph partitioning. A streaming graph partitioning algorithm reads
vertices once and assigns that vertex to a partition accordingly. This is also
called an one-pass algorithm. This paper proposes an efficient window-based
streaming graph partitioning algorithm called WStream. The WStream algorithm is
an edge-cut partitioning algorithm, which distributes a vertex among the
partitions. Our results suggest that the WStream algorithm is able to partition
large graph data efficiently while keeping the load balanced across different
partitions, and communication to a minimum. Evaluation results with real
workloads also prove the effectiveness of our proposed algorithm, and it
achieves a significant reduction in load imbalance and edge-cut with different
ranges of dataset
UnifyDR: A Generic Framework for Unifying Data and Replica Placement
The advent of (big) data management applications operating at Cloud scale has led to extensive research on the data placement problem. The key objective of data placement is to obtain a partitioning (possibly allowing for replicas) of a set of data-items into distributed nodes that minimizes the overall network communication cost. Although replication is intrinsic to data placement, it has seldom been studied in combination with the latter. On the contrary, most of the existing solutions treat them as two independent problems, and employ a two-phase approach: (1) data placement, followed by (2) replica placement. We address this by proposing a new paradigm, CDR , with the objective of c ombining d ata and r eplica placement as a single joint optimization problem. Specifically, we study two variants of the CDR problem: (1) CDR-Single , where the objective is to minimize the communication cost alone, and (2) CDR-Multi , which performs a multi-objective optimization to also minimize traffic and storage costs. To unify data and replica placement, we propose a generic framework called UnifyDR , which leverages overlapping correlation clustering to assign a data-item to multiple nodes, thereby facilitating data and replica placement to be performed jointly. We establish the generic nature of UnifyDR by portraying its ability to address the CDR problem in two real-world use-cases, that of join-intensive online analytical processing (OLAP) queries and a location-based online social network (OSN) service. The effectiveness and scalability of UnifyDR are showcased by experiments performed on data generated using the TPC-DS benchmark and a trace of the Gowalla OSN for the OLAP queries and OSN service use-case, respectively. Empirically, the presented approach obtains an improvement of approximately 35% in terms of the evaluated metrics and a speed-up of 8 times in comparison to state-of-the-art techniques.This work was supported by the Agentschap Innoveren & Ondernemen (VLAIO) Strategic Fundamental Research (SBO) under Grant 150038 (DiSSeCt)
- …