1,992 research outputs found
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
A Practical Study of Self-Stabilization for Prefix-Tree Based Overlay Networks
Service discovery is crucial in the development of fully decentralized computational grids. Among the significant amount of work produced by the convergence of peer-to-peer (P2P) systems and grids, a new kind of overlay networks, based on prefix trees, has emerged. In particular, the Distributed Lexicographic Placement Table (DLPT) approach is a decentralized and dynamic service discovery service. Fault-tolerance within the DLPT approach is achieved through best-effort policies relying on formal self-stabilization results. Self-stabilization means that the tree can become transiently inconsistent, but is guaranteed to autonomously converge to a correct topology after arbitrary crashes, in a finite time. However, during convergence, the tree may not be able to process queries correctly. In this paper, we present some simulation results having several objectives. First, we investigate the interest of self-stabilization for such architectures. Second, we explore, still based on simulation, a simple Time-To-Live policy to avoid useless processing during convergence time
Dynamic load balancing of parallel road traffic simulation
The objective of this research was to investigate, develop and evaluate dynamic
load-balancing strategies for parallel execution of microscopic road traffic simulations. Urban road traffic simulation presents irregular, and dynamically varying
distributed computational load for a parallel processor system. The dynamic
nature of road traffic simulation systems lead to uneven load distribution during simulation, even for a system that starts off with even load distributions. Load balancing is a potential way of achieving improved performance by reallocating
work from highly loaded processors to lightly loaded processors leading to
a reduction in the overall computational time. In dynamic load balancing,
workloads are adjusted continually or periodically throughout the computation.
In this thesis load balancing strategies were evaluated and some load balancing
policies developed. A load index and a profitability determination algorithms
were developed. These were used to enhance two load balancing algorithms. One
of the algorithms exhibits local communications and distributed load evaluation
between the neighbour partitions (diffusion algorithm) and the other algorithm
exhibits both local and global communications while the decision making is
centralized (MaS algorithm). The enhanced algorithms were implemented and
synthesized with a research parallel traffic simulation. The performance of the
research parallel traffic simulator, optimized with the two modified dynamic load balancing strategies were studied
Serving Graph Neural Networks With Distributed Fog Servers For Smart IoT Services
Graph Neural Networks (GNNs) have gained growing interest in miscellaneous
applications owing to their outstanding ability in extracting latent
representation on graph structures. To render GNN-based service for IoT-driven
smart applications, traditional model serving paradigms usually resort to the
cloud by fully uploading geo-distributed input data to remote datacenters.
However, our empirical measurements reveal the significant communication
overhead of such cloud-based serving and highlight the profound potential in
applying the emerging fog computing. To maximize the architectural benefits
brought by fog computing, in this paper, we present Fograph, a novel
distributed real-time GNN inference framework that leverages diverse and
dynamic resources of multiple fog nodes in proximity to IoT data sources. By
introducing heterogeneity-aware execution planning and GNN-specific compression
techniques, Fograph tailors its design to well accommodate the unique
characteristics of GNN serving in fog environments. Prototype-based evaluation
and case study demonstrate that Fograph significantly outperforms the
state-of-the-art cloud serving and fog deployment by up to 5.39x execution
speedup and 6.84x throughput improvement.Comment: Accepted by IEEE/ACM Transactions on Networkin
SQUARE: Scalable Quorum-Based Atomic Memory with Local Reconfiguration
International audienceInternet applications require more and more resources to satisfy the unpredictable clients needs. Specifically, such applications must ensure quality of service despite bursts of load. Distributed dynamic self-organized systems present an inherent adaptiveness that can face unpredictable bursts of load. Nevertheless quality of service, and more particularly data consistency, remains hardly achievable in such systems since participants (i.e., nodes) can crash, leave, and join the system at arbitrary time. The atomic consistency guarantees that any read operation returns the last written value of a data and is generalizable to data composition. To guarantee atomicity in message-passing model, mutually intersecting sets (a.k.a.quorums) of nodes are used. The solution presented here, namely SQUARE, provides scalability, load-balancing, fault-tolerance, and self-adaptiveness, while ensuring atomic consistency. We specify our solution, prove it correct and analyse it through simulations. \\ Les applications utilisées via internet nécessitent de plus en plus de ressources afin de satisfaire les besoins imprévisibles des clients. De telles applications doivent assurer une certaine qualité de service en dépit des pics de charge. Les systÚmes distribués dynamiques capable de s'auto-organiser ont une capacité intrinsÚque pour supporter ces pics de charge imprévisibles. Cependant, la qualité de service et plus particuliÚrement la cohérence des données reste trÚs difficile à assurer dans de tels systÚmes. En effet, les participants, ou noeuds, peuvent rejoindre, quitter le systÚme, et tomber en panne de façon arbitraire. La cohérence atomique assure que toute lecture renvoie la derniÚre valeur écrite et la relation de composition la préserve. Afin de garantir l'atomicité dans un modÚle à passage de message, des ensembles de noeuds s'intersectant mutuellement (les quorums) sont utilisés. La solution présentée ici, appelée SQUARE, est exploitable à grande échelle, permet de balancer la charge, tolÚre les pannes et s'auto-adapte tout en assurant l'atomicité. Nous spécifions la solution, la prouvons correcte et la simulons pour en analyser les performances
An Overview of Process Mapping Techniques and Algorithms in High-Performance Computing
International audienceDue to the advent of modern hardware architectures of high-performance comput- ers, the way the parallel applications are laid out is of paramount importance for performance. This chapter surveys several techniques and algorithms that efficiently address this issue: the mapping of the application's virtual topology (for instance its communication pattern) onto the physical topology. Using such strategy enables to improve the application overall execution time significantly. The chapter concludes by listing a series of open issues and problems
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS
GROMACS is a widely used package for biomolecular simulation, and over the
last two decades it has evolved from small-scale efficiency to advanced
heterogeneous acceleration and multi-level parallelism targeting some of the
largest supercomputers in the world. Here, we describe some of the ways we have
been able to realize this through the use of parallelization on all levels,
combined with a constant focus on absolute performance. Release 4.6 of GROMACS
uses SIMD acceleration on a wide range of architectures, GPU offloading
acceleration, and both OpenMP and MPI parallelism within and between nodes,
respectively. The recent work on acceleration made it necessary to revisit the
fundamental algorithms of molecular simulation, including the concept of
neighborsearching, and we discuss the present and future challenges we see for
exascale simulation - in particular a very fine-grained task parallelism. We
also discuss the software management, code peer review and continuous
integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin
- âŠ