7,458 research outputs found
Parallel Discrete Event Simulation with Erlang
Discrete Event Simulation (DES) is a widely used technique in which the state
of the simulator is updated by events happening at discrete points in time
(hence the name). DES is used to model and analyze many kinds of systems,
including computer architectures, communication networks, street traffic, and
others. Parallel and Distributed Simulation (PADS) aims at improving the
efficiency of DES by partitioning the simulation model across multiple
processing elements, in order to enabling larger and/or more detailed studies
to be carried out. The interest on PADS is increasing since the widespread
availability of multicore processors and affordable high performance computing
clusters. However, designing parallel simulation models requires considerable
expertise, the result being that PADS techniques are not as widespread as they
could be. In this paper we describe ErlangTW, a parallel simulation middleware
based on the Time Warp synchronization protocol. ErlangTW is entirely written
in Erlang, a concurrent, functional programming language specifically targeted
at building distributed systems. We argue that writing parallel simulation
models in Erlang is considerably easier than using conventional programming
languages. Moreover, ErlangTW allows simulation models to be executed either on
single-core, multicore and distributed computing architectures. We describe the
design and prototype implementation of ErlangTW, and report some preliminary
performance results on multicore and distributed architectures using the well
known PHOLD benchmark.Comment: Proceedings of ACM SIGPLAN Workshop on Functional High-Performance
Computing (FHPC 2012) in conjunction with ICFP 2012. ISBN: 978-1-4503-1577-
Parallel and Distributed Simulation from Many Cores to the Public Cloud (Extended Version)
In this tutorial paper, we will firstly review some basic simulation concepts
and then introduce the parallel and distributed simulation techniques in view
of some new challenges of today and tomorrow. More in particular, in the last
years there has been a wide diffusion of many cores architectures and we can
expect this trend to continue. On the other hand, the success of cloud
computing is strongly promoting the everything as a service paradigm. Is
parallel and distributed simulation ready for these new challenges? The current
approaches present many limitations in terms of usability and adaptivity: there
is a strong need for new evaluation metrics and for revising the currently
implemented mechanisms. In the last part of the paper, we propose a new
approach based on multi-agent systems for the simulation of complex systems. It
is possible to implement advanced techniques such as the migration of simulated
entities in order to build mechanisms that are both adaptive and very easy to
use. Adaptive mechanisms are able to significantly reduce the communication
cost in the parallel/distributed architectures, to implement load-balance
techniques and to cope with execution environments that are both variable and
dynamic. Finally, such mechanisms will be used to build simulations on top of
unreliable cloud services.Comment: Tutorial paper published in the Proceedings of the International
Conference on High Performance Computing and Simulation (HPCS 2011). Istanbul
(Turkey), IEEE, July 2011. ISBN 978-1-61284-382-
Evaluating Cache Coherent Shared Virtual Memory for Heterogeneous Multicore Chips
The trend in industry is towards heterogeneous multicore processors (HMCs),
including chips with CPUs and massively-threaded throughput-oriented processors
(MTTOPs) such as GPUs. Although current homogeneous chips tightly couple the
cores with cache-coherent shared virtual memory (CCSVM), this is not the
communication paradigm used by any current HMC. In this paper, we present a
CCSVM design for a CPU/MTTOP chip, as well as an extension of the pthreads
programming model, called xthreads, for programming this HMC. Our goal is to
evaluate the potential performance benefits of tightly coupling heterogeneous
cores with CCSVM
Modeling the Internet of Things: a simulation perspective
This paper deals with the problem of properly simulating the Internet of
Things (IoT). Simulating an IoT allows evaluating strategies that can be
employed to deploy smart services over different kinds of territories. However,
the heterogeneity of scenarios seriously complicates this task. This imposes
the use of sophisticated modeling and simulation techniques. We discuss novel
approaches for the provision of scalable simulation scenarios, that enable the
real-time execution of massively populated IoT environments. Attention is given
to novel hybrid and multi-level simulation techniques that, when combined with
agent-based, adaptive Parallel and Distributed Simulation (PADS) approaches,
can provide means to perform highly detailed simulations on demand. To support
this claim, we detail a use case concerned with the simulation of vehicular
transportation systems.Comment: Proceedings of the IEEE 2017 International Conference on High
Performance Computing and Simulation (HPCS 2017
A GPU-accelerated package for simulation of flow in nanoporous source rocks with many-body dissipative particle dynamics
Mesoscopic simulations of hydrocarbon flow in source shales are challenging,
in part due to the heterogeneous shale pores with sizes ranging from a few
nanometers to a few micrometers. Additionally, the sub-continuum fluid-fluid
and fluid-solid interactions in nano- to micro-scale shale pores, which are
physically and chemically sophisticated, must be captured. To address those
challenges, we present a GPU-accelerated package for simulation of flow in
nano- to micro-pore networks with a many-body dissipative particle dynamics
(mDPD) mesoscale model. Based on a fully distributed parallel paradigm, the
code offloads all intensive workloads on GPUs. Other advancements, such as
smart particle packing and no-slip boundary condition in complex pore
geometries, are also implemented for the construction and the simulation of the
realistic shale pores from 3D nanometer-resolution stack images. Our code is
validated for accuracy and compared against the CPU counterpart for speedup. In
our benchmark tests, the code delivers nearly perfect strong scaling and weak
scaling (with up to 512 million particles) on up to 512 K20X GPUs on Oak Ridge
National Laboratory's (ORNL) Titan supercomputer. Moreover, a single-GPU
benchmark on ORNL's SummitDev and IBM's AC922 suggests that the host-to-device
NVLink can boost performance over PCIe by a remarkable 40\%. Lastly, we
demonstrate, through a flow simulation in realistic shale pores, that the CPU
counterpart requires 840 Power9 cores to rival the performance delivered by our
package with four V100 GPUs on ORNL's Summit architecture. This simulation
package enables quick-turnaround and high-throughput mesoscopic numerical
simulations for investigating complex flow phenomena in nano- to micro-porous
rocks with realistic pore geometries
RITSim: distributed systemC simulation
Parallel or distributed simulation is becoming more than a novel way to speedup design evaluation; it is becoming necessary for simulating modern processors in a reasonable timeframe. As architectural features become faster, smaller, and more complex, designers are interested in obtaining detailed and accurate performance and power estimations. Uniprocessor simulators may not be able to meet such demands. The RITSim project uses SystemC to model a processor microarchitecture and memory subsystem in great detail. SystemC is a C++ library built on a discrete-event simulation kernel. Many projects have successfully implemented parallel discrete-event simulation (PDES) frameworks to distribute simulation among several hosts. The field promises significant simulation speedup, possibly leading to faster turnaround time in design space exploration and commercial production. However, parallel implementation of such simulators is not an easy task. It requires modification of the simulation kernel for effective partitioning and synchronization. This thesis explores PDES techniques and presents a distributed version of the SystemC simulation environment. With minimal user interaction, SystemC models can executed on a cluster of workstations using a message-passing library such as the Message Passing Interface (MPI). The implementation is designed for transparency; distribution and synchronization happen with little intervention by the model author. Modification of SystemC is fashioned to promote maintainability with future releases. Furthermore, only freely available libraries are used for maximum flexibility and portability
Shared memory as a basis for conservative distributed architectural simulation
Journal ArticleThis paper describes experience in parallelizing an execution-driven architectural simulation system used in the development and evaluation of the Avalanche distributed architecture. It reports on a specific application of conservative distributed simulation on a shared memory platform. Various communication-intensive synchronization algorithms are described and evaluated. Performance results on a bus-based shared memory platform are reported, and extension and scalability of the implementation to larger distributed shared memory configurations are discussed. Also addressed are specific characteristics of architectural simulations that contribute to decisions relating to the conservatism of the approach and to the achievable performance
- …