Search CORE

609 research outputs found

A Tuned and Scalable Fast Multipole Method as a Preeminent Algorithm for Exascale Systems

Author: Bergman K
Chandramowlishwaran A
Hamada T
Lorena A Barba
Rahimian A
Rio Yokota
Warren M
Yokota R
Publication venue: 'SAGE Publications'
Publication date: 16/10/2011
Field of study

Among the algorithms that are likely to play a major role in future exascale computing, the fast multipole method (FMM) appears as a rising star. Our previous recent work showed scaling of an FMM on GPU clusters, with problem sizes in the order of billions of unknowns. That work led to an extremely parallel FMM, scaling to thousands of GPUs or tens of thousands of CPUs. This paper reports on a a campaign of performance tuning and scalability studies using multi-core CPUs, on the Kraken supercomputer. All kernels in the FMM were parallelized using OpenMP, and a test using 10^7 particles randomly distributed in a cube showed 78% efficiency on 8 threads. Tuning of the particle-to-particle kernel using SIMD instructions resulted in 4x speed-up of the overall algorithm on single-core tests with 10^3 - 10^7 particles. Parallel scalability was studied in both strong and weak scaling. The strong scaling test used 10^8 particles and resulted in 93% parallel efficiency on 2048 processes for the non-SIMD code and 54% for the SIMD-optimized code (which was still 2x faster). The weak scaling test used 10^6 particles per process, and resulted in 72% efficiency on 32,768 processes, with the largest calculation taking about 40 seconds to evaluate more than 32 billion unknowns. This work builds up evidence for our view that FMM is poised to play a leading role in exascale computing, and we end the paper with a discussion of the features that make it a particularly favorable algorithm for the emerging heterogeneous and massively parallel architectural landscape

arXiv.org e-Print Archive

Crossref

Coordinated Self-Adaptation in Large-Scale Peer-to-Peer Overlays

Author: Napper J.M.
Pierre G.E.O.
Sacha J.
Stratan C.
Publication venue: Amsterdam, the Netherlands
Publication date: 01/01/2010
Field of study

Self-adaptive systems typically rely on a closed control loop which detects when the current behavior deviates too much from the optimal one, determines new optimal values for system parameters, and applies changes to the system configuration. In decentralized systems, implementing each of these steps is challenging, especially when nodes need to coordinate their local configurations. In this paper, we propose a decentralized method to automatically tune global system parameters in a coordinated manner. We use gossip-based protocols to continuously monitor system properties and to disseminate parameter updates. We show that this method applied to a decentralized resource selection service allows the system to quickly adapt to changes in workload types and node properties, and only incurs a negligible communication overhead

VU Research Portal

Solving key design issues for massively multiplayer online games on peer-to-peer architectures

Author: Fan Lu
Publication venue: Mathematical and Computer Sciences
Publication date: 01/01/2009
Field of study

Massively Multiplayer Online Games (MMOGs) are increasing in both popularity and scale on the Internet and are predominantly implemented by Client/Server architectures. While such a classical approach to distributed system design offers many benefits, it suffers from significant technical and commercial drawbacks, primarily reliability and scalability costs. This realisation has sparked recent research interest in adapting MMOGs to Peer-to-Peer (P2P) architectures. This thesis identifies six key design issues to be addressed by P2P MMOGs, namely interest management, event dissemination, task sharing, state persistency, cheating mitigation, and incentive mechanisms. Design alternatives for each issue are systematically compared, and their interrelationships discussed. How well representative P2P MMOG architectures fulfil the design criteria is also evaluated. It is argued that although P2P MMOG architectures are developing rapidly, their support for task sharing and incentive mechanisms still need to be improved. The design of a novel framework for P2P MMOGs, Mediator, is presented. It employs a self-organising super-peer network over a P2P overlay infrastructure, and addresses the six design issues in an integrated system. The Mediator framework is extensible, as it supports flexible policy plug-ins and can accommodate the introduction of new superpeer roles. Key components of this framework have been implemented and evaluated with a simulated P2P MMOG. As the Mediator framework relies on super-peers for computational and administrative tasks, membership management is crucial, e.g. to allow the system to recover from super-peer failures. A new technology for this, namely Membership-Aware Multicast with Bushiness Optimisation (MAMBO), has been designed, implemented and evaluated. It reuses the communication structure of a tree-based application-level multicast to track group membership efficiently. Evaluation of a demonstration application shows i that MAMBO is able to quickly detect and handle peers joining and leaving. Compared to a conventional supervision architecture, MAMBO is more scalable, and yet incurs less communication overheads. Besides MMOGs, MAMBO is suitable for other P2P applications, such as collaborative computing and multimedia streaming. This thesis also presents the design, implementation and evaluation of a novel task mapping infrastructure for heterogeneous P2P environments, Deadline-Driven Auctions (DDA). DDA is primarily designed to support NPC host allocation in P2P MMOGs, and specifically in the Mediator framework. However, it can also support the sharing of computational and interactive tasks with various deadlines in general P2P applications. Experimental and analytical results demonstrate that DDA efficiently allocates computing resources for large numbers of real-time NPC tasks in a simulated P2P MMOG with approximately 1000 players. Furthermore, DDA supports gaming interactivity by keeping the communication latency among NPC hosts and ordinary players low. It also supports flexible matchmaking policies, and can motivate application participants to contribute resources to the system

CiteSeerX

ROS: The Research Output Service. Heriot-Watt University Edinburgh

Recommended from our members

A Paradigm for Scalable, Transactional, and Efficient Spatial Indexes

Author: Gao Ning
Publication venue: University of Colorado Boulder
Publication date: 01/01/2018
Field of study

With large volumes of geo-tagged data collected in various applications, spatial query pro- cessing becomes essential. Query engines depend on efficient indexes to expedite processing. There are three main challenges: scaling out to accommodate large volumes of spatial data, support- ing transactional primitives for strong consistency guarantees, and adapting to highly dynamic workloads. This thesis proposes a paradigm for scalable, transactional, and efficient spatial indexes to significantly reduce development efforts in designing and comparing multiple spatial indexes.This thesis first introduces a distributed and transactional key value store called DTranx to persist the spatial indexes. DTranx follows the SEDA architecture to exploit high concurrency in multi-core environments and it adopts a hybrid of optimistic concurrency control and two-phase commit protocols to narrow down the critical sections of distributed locking during transaction com- mits. Moreover, DTranx integrates a persistent memory based write-ahead log to reduce durability overhead and combines a garbage collection mechanism without affecting normal transactions. To maintain high throughput for search workloads when databases are constantly updated, snapshot transactions are introduced.Then, a paradigm is presented with a set of intuitive APIs and a Mempool runtime to re- duce development efforts. Mempool transparently synchronizes local states of data structures with DTranx and it handles two critical tasks: address translation and transparent server synchroniza- tion, of which the latter includes transaction construction and data synchronization. Furthermore, a dynamic partitioning strategy is integrated into DTranx to generate partitioning and replication plans that reduce inter-server communications and balance resource usage.Lastly, single-threaded data structures BTree and RTree are converted into distributed versions within two weeks. The BTree and RTree achieve 253.07 kops/sec and 77.83 kops/sec through- put respectively for pure search operations in a 25-server cluster

CU Scholar Institutional Repository