102,284 research outputs found
Inc-part: incremental partitioning for load balancing in large-scale behavioral simulations
Large-scale behavioral simulations are widely used to study real-world multi-agent systems. Such programs normally run in discrete time-steps or ticks, with simulated space decomposed into domains that are distributed over a set of workers to achieve parallelism. A distinguishing feature of behavioral simulations is their frequent and high-volume group migration, the phenomenon in which simulated objects traverse domains in groups at massive scale in each tick. This results in continual and significant load imbalance among domains. To tackle this problem, traditional load balancing approaches either require excessive load re-profiling and redistribution, which lead to high computation/communication costs, or perform poorly because their statically partitioned data domains cannot reflect load changes brought by group migration. In this paper, we propose an effective and low-cost load balancing scheme, named Inc-part, based on a key observation that an object is unlikely to move a long distance (across many domains) within a single tick. This localized mobility property allows one to efficiently estimate the load of a dynamic domain incrementally, based on merely the load changes occurring in its neighborhood. The domains experiencing significant load changes are then partitioned or merged, and redistributed to redress load imbalance among the workers. Experiments on a 64-node (1,024-core) platform show that Inc-part can attain excellent load balance with dramatically lowered costs compared to state-of-the-art solutions
Integrative Dynamic Reconfiguration in a Parallel Stream Processing Engine
Load balancing, operator instance collocations and horizontal scaling are
critical issues in Parallel Stream Processing Engines to achieve low data
processing latency, optimized cluster utilization and minimized communication
cost respectively. In previous work, these issues are typically tackled
separately and independently. We argue that these problems are tightly coupled
in the sense that they all need to determine the allocations of workloads and
migrate computational states at runtime. Optimizing them independently would
result in suboptimal solutions. Therefore, in this paper, we investigate how
these three issues can be modeled as one integrated optimization problem. In
particular, we first consider jobs where workload allocations have little
effect on the communication cost, and model the problem of load balance as a
Mixed-Integer Linear Program. Afterwards, we present an extended solution
called ALBIC, which support general jobs. We implement the proposed techniques
on top of Apache Storm, an open-source Parallel Stream Processing Engine. The
extensive experimental results over both synthetic and real datasets show that
our techniques clearly outperform existing approaches
The Simulation Model Partitioning Problem: an Adaptive Solution Based on Self-Clustering (Extended Version)
This paper is about partitioning in parallel and distributed simulation. That
means decomposing the simulation model into a numberof components and to
properly allocate them on the execution units. An adaptive solution based on
self-clustering, that considers both communication reduction and computational
load-balancing, is proposed. The implementation of the proposed mechanism is
tested using a simulation model that is challenging both in terms of structure
and dynamicity. Various configurations of the simulation model and the
execution environment have been considered. The obtained performance results
are analyzed using a reference cost model. The results demonstrate that the
proposed approach is promising and that it can reduce the simulation execution
time in both parallel and distributed architectures
Maximizing Service Reliability in Distributed Computing Systems with Random Node Failures: Theory and Implementation
In distributed computing systems (DCSs) where server nodes can fail permanently with nonzero probability, the system performance can be assessed by means of the service reliability, defined as the probability of serving all the tasks queued in the DCS before all the nodes fail. This paper presents a rigorous probabilistic framework to analytically characterize the service reliability of a DCS in the presence of communication uncertainties and stochastic topological changes due to node deletions. The framework considers a system composed of heterogeneous nodes with stochastic service and failure times and a communication network imposing random tangible delays. The framework also permits arbitrarily specified, distributed load-balancing actions to be taken by the individual nodes in order to improve the service reliability. The presented analysis is based upon a novel use of the concept of stochastic regeneration, which is exploited to derive a system of difference-differential equations characterizing the service reliability. The theory is further utilized to optimize certain load-balancing policies for maximal service reliability; the optimization is carried out by means of an algorithm that scales linearly with the number of nodes in the system. The analytical model is validated using both Monte Carlo simulations and experimental data collected from a DCS testbed
Adaptive Load Balancing: A Study in Multi-Agent Learning
We study the process of multi-agent reinforcement learning in the context of
load balancing in a distributed system, without use of either central
coordination or explicit communication. We first define a precise framework in
which to study adaptive load balancing, important features of which are its
stochastic nature and the purely local information available to individual
agents. Given this framework, we show illuminating results on the interplay
between basic adaptive behavior parameters and their effect on system
efficiency. We then investigate the properties of adaptive load balancing in
heterogeneous populations, and address the issue of exploration vs.
exploitation in that context. Finally, we show that naive use of communication
may not improve, and might even harm system efficiency.Comment: See http://www.jair.org/ for any accompanying file
A survey of self organisation in future cellular networks
This article surveys the literature over the period of the last decade on the emerging field of self organisation as applied to wireless cellular communication networks. Self organisation has been extensively studied and applied in adhoc networks, wireless sensor networks and autonomic computer networks; however in the context of wireless cellular networks, this is the first attempt to put in perspective the various efforts in form of a tutorial/survey. We provide a comprehensive survey of the existing literature, projects and standards in self organising cellular networks. Additionally, we also aim to present a clear understanding of this active research area, identifying a clear taxonomy and guidelines for design of self organising mechanisms. We compare strength and weakness of existing solutions and highlight the key research areas for further development. This paper serves as a guide and a starting point for anyone willing to delve into research on self organisation in wireless cellular communication networks
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
Hybrid static/dynamic scheduling for already optimized dense matrix factorization
We present the use of a hybrid static/dynamic scheduling strategy of the task
dependency graph for direct methods used in dense numerical linear algebra.
This strategy provides a balance of data locality, load balance, and low
dequeue overhead. We show that the usage of this scheduling in communication
avoiding dense factorization leads to significant performance gains. On a 48
core AMD Opteron NUMA machine, our experiments show that we can achieve up to
64% improvement over a version of CALU that uses fully dynamic scheduling, and
up to 30% improvement over the version of CALU that uses fully static
scheduling. On a 16-core Intel Xeon machine, our hybrid static/dynamic
scheduling approach is up to 8% faster than the version of CALU that uses a
fully static scheduling or fully dynamic scheduling. Our algorithm leads to
speedups over the corresponding routines for computing LU factorization in well
known libraries. On the 48 core AMD NUMA machine, our best implementation is up
to 110% faster than MKL, while on the 16 core Intel Xeon machine, it is up to
82% faster than MKL. Our approach also shows significant speedups compared with
PLASMA on both of these systems
- …