609 research outputs found
A Tuned and Scalable Fast Multipole Method as a Preeminent Algorithm for Exascale Systems
Among the algorithms that are likely to play a major role in future exascale
computing, the fast multipole method (FMM) appears as a rising star. Our
previous recent work showed scaling of an FMM on GPU clusters, with problem
sizes in the order of billions of unknowns. That work led to an extremely
parallel FMM, scaling to thousands of GPUs or tens of thousands of CPUs. This
paper reports on a a campaign of performance tuning and scalability studies
using multi-core CPUs, on the Kraken supercomputer. All kernels in the FMM were
parallelized using OpenMP, and a test using 10^7 particles randomly distributed
in a cube showed 78% efficiency on 8 threads. Tuning of the
particle-to-particle kernel using SIMD instructions resulted in 4x speed-up of
the overall algorithm on single-core tests with 10^3 - 10^7 particles. Parallel
scalability was studied in both strong and weak scaling. The strong scaling
test used 10^8 particles and resulted in 93% parallel efficiency on 2048
processes for the non-SIMD code and 54% for the SIMD-optimized code (which was
still 2x faster). The weak scaling test used 10^6 particles per process, and
resulted in 72% efficiency on 32,768 processes, with the largest calculation
taking about 40 seconds to evaluate more than 32 billion unknowns. This work
builds up evidence for our view that FMM is poised to play a leading role in
exascale computing, and we end the paper with a discussion of the features that
make it a particularly favorable algorithm for the emerging heterogeneous and
massively parallel architectural landscape
Coordinated Self-Adaptation in Large-Scale Peer-to-Peer Overlays
Self-adaptive systems typically rely on a closed control loop which detects when the current behavior deviates too much from the optimal one, determines new optimal values for system parameters, and applies changes to the system configuration. In decentralized systems, implementing each of these steps is challenging, especially when nodes need to coordinate their local configurations. In this paper, we propose a decentralized method to automatically tune global system parameters in a coordinated manner. We use gossip-based protocols to continuously monitor system properties and to disseminate parameter updates. We show that this method applied to a decentralized resource selection service allows the system to quickly adapt to changes in workload types and node properties, and only incurs a negligible communication overhead
Solving key design issues for massively multiplayer online games on peer-to-peer architectures
Massively Multiplayer Online Games (MMOGs) are increasing in both popularity and
scale on the Internet and are predominantly implemented by Client/Server architectures.
While such a classical approach to distributed system design offers many benefits, it suffers
from significant technical and commercial drawbacks, primarily reliability and scalability
costs. This realisation has sparked recent research interest in adapting MMOGs
to Peer-to-Peer (P2P) architectures.
This thesis identifies six key design issues to be addressed by P2P MMOGs, namely
interest management, event dissemination, task sharing, state persistency, cheating mitigation,
and incentive mechanisms. Design alternatives for each issue are systematically
compared, and their interrelationships discussed. How well representative P2P MMOG
architectures fulfil the design criteria is also evaluated. It is argued that although P2P
MMOG architectures are developing rapidly, their support for task sharing and incentive
mechanisms still need to be improved.
The design of a novel framework for P2P MMOGs, Mediator, is presented. It employs a
self-organising super-peer network over a P2P overlay infrastructure, and addresses the
six design issues in an integrated system. The Mediator framework is extensible, as it
supports flexible policy plug-ins and can accommodate the introduction of new superpeer
roles. Key components of this framework have been implemented and evaluated
with a simulated P2P MMOG.
As the Mediator framework relies on super-peers for computational and administrative
tasks, membership management is crucial, e.g. to allow the system to recover from
super-peer failures. A new technology for this, namely Membership-Aware Multicast
with Bushiness Optimisation (MAMBO), has been designed, implemented and evaluated.
It reuses the communication structure of a tree-based application-level multicast
to track group membership efficiently. Evaluation of a demonstration application shows
i
that MAMBO is able to quickly detect and handle peers joining and leaving. Compared
to a conventional supervision architecture, MAMBO is more scalable, and yet incurs
less communication overheads. Besides MMOGs, MAMBO is suitable for other P2P
applications, such as collaborative computing and multimedia streaming.
This thesis also presents the design, implementation and evaluation of a novel task
mapping infrastructure for heterogeneous P2P environments, Deadline-Driven Auctions
(DDA). DDA is primarily designed to support NPC host allocation in P2P MMOGs, and
specifically in the Mediator framework. However, it can also support the sharing of computational
and interactive tasks with various deadlines in general P2P applications. Experimental
and analytical results demonstrate that DDA efficiently allocates computing
resources for large numbers of real-time NPC tasks in a simulated P2P MMOG with approximately
1000 players. Furthermore, DDA supports gaming interactivity by keeping
the communication latency among NPC hosts and ordinary players low. It also supports
flexible matchmaking policies, and can motivate application participants to contribute
resources to the system
Recommended from our members
A Paradigm for Scalable, Transactional, and Efficient Spatial Indexes
With large volumes of geo-tagged data collected in various applications, spatial query pro- cessing becomes essential. Query engines depend on efficient indexes to expedite processing. There are three main challenges: scaling out to accommodate large volumes of spatial data, support- ing transactional primitives for strong consistency guarantees, and adapting to highly dynamic workloads. This thesis proposes a paradigm for scalable, transactional, and efficient spatial indexes to significantly reduce development efforts in designing and comparing multiple spatial indexes.This thesis first introduces a distributed and transactional key value store called DTranx to persist the spatial indexes. DTranx follows the SEDA architecture to exploit high concurrency in multi-core environments and it adopts a hybrid of optimistic concurrency control and two-phase commit protocols to narrow down the critical sections of distributed locking during transaction com- mits. Moreover, DTranx integrates a persistent memory based write-ahead log to reduce durability overhead and combines a garbage collection mechanism without affecting normal transactions. To maintain high throughput for search workloads when databases are constantly updated, snapshot transactions are introduced.Then, a paradigm is presented with a set of intuitive APIs and a Mempool runtime to re- duce development efforts. Mempool transparently synchronizes local states of data structures with DTranx and it handles two critical tasks: address translation and transparent server synchroniza- tion, of which the latter includes transaction construction and data synchronization. Furthermore, a dynamic partitioning strategy is integrated into DTranx to generate partitioning and replication plans that reduce inter-server communications and balance resource usage.Lastly, single-threaded data structures BTree and RTree are converted into distributed versions within two weeks. The BTree and RTree achieve 253.07 kops/sec and 77.83 kops/sec through- put respectively for pure search operations in a 25-server cluster
- …