Search CORE

56,058 research outputs found

Parallel DaSSF Discrete-Event Simulation without Shared Memory

Author: Chalfant James D
Publication venue: Dartmouth Digital Commons
Publication date: 01/06/1999
Field of study

The Dartmouth implementation of the Scalable Simulation Framework (DaSSF) is a discrete-event simulator used primarily in the simulation of networks. It achieves high performance through parallel processing. DaSSF 1.22 requires shared memory between all processors in order to operate. This limits the number of processors available and the hardware platforms that can exploit parallelism. We are interested in extending parallel DaSSF operation to architectures without shared memory. We explore the requirements of this by implementing parallel DaSSF using MPI as the sole form of interaction between processors. The approaches used to achieve this can be abstracted and applied to the current version of DaSSF. This would allow parallel simulation using shared memory by processors within a single machine, and also at a higher level between separate machines using distributed memory

Dartmouth Digital Commons (Dartmouth College)

Parallel discrete event simulation: A shared memory approach

Author: Malony Allen D.
Mccredie Bradley D.
Reed Daniel A.
Publication venue
Publication date
Field of study

With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models

NASA Technical Reports Server

Analysis and Optimization of a Demographic Simulator for Parallel Environments

Author: Büsing Menses Vanessa
Casanovas Garcia Josep
Montañola Sales Cristina
Pellegrini Alessandro
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

In the past years, the advent of multi-core machines has led to the need for adapting current simulation solutions to modern hardware architectures. In this poster, we present a solution to exploit multicore shared-memory capacities in Yades, a parallel tool for running socio-demography dynamic simulations. We propose to abandon the single-threaded programming approach addresses in Yades by using ROOT-Sim, a library which allows to apply discrete event simulation to parallel environments profiting share-memory capabilities. As a result of this new approach, our results show the improvement in Yades’ performance and scalability

Crossref

ART

Archivio della ricerca- Università di Roma La Sapienza

A CO-SIMULATION ENVIRONMENT FOR MIXED SIGNAL, MULTI-DOMAIN SYSTEM LEVEL DESIGN EXPLORATION

Author: Reed David K
Publication venue
Publication date: 13/09/2004
Field of study

This thesis presents a system-level co-simulation environment for mixed domain design exploration. By employing shared memory IPC (Inter-Process Communication) and utilizing PDES (Parallel Discrete Event Simulation) techniques, we examine two methods of synchronization, lock-step and dynamic. We then compare the performance of these two methods on a series of test systems as well as real designs using the Chatoyant MOEMS (Micro-Electro Mechanical Systems) simulator and the mixed HDL (Hardware Description Language) simulator from Model Technology, ModelSim. The results collected are used to ascertain which method provides the best overall performance with the least overhead

D-Scholarship@Pitt

Time warp on a shared memory multiprocessor

Author: Fujimoto Richard M.
Publication venue: University of Utah
Publication date: 01/01/1988
Field of study

Journal ArticleA variation of the Time Warp parallel discrete event simulation mechanism is presented that is optimized for execution on a shared memory multiprocessor. In particular, the direct cancellation mechanism is proposed that eliminates the need for anti-messages and provides an efficient mechanism for cancelling erroneous computations. The mechanism thereby eliminates many of the overheads associated with conventional, message-based implementations of Time Warp. More importantly, this mechanism effects rapid repairs of the parallel computation when an error is discovered. Initial performance measurements of an implementation of the mechanism executing on a BBN Butterfly? multiprocessor are presented. These measurements indicate that the mechanism achieves good performance, particularly for many workloads where conservative clock synchronization algorithms perform poorly. Speedups as high as 56.8 using 64 processors were obtained. However, our studies also indicate that state saving overheads represent a significant stumbling block for many parallel simulations using Time Warp

The University of Utah: J. Willard Marriott Digital Library

The virtual time machine

Author: Fujimoto Richard M.
Publication venue: University of Utah
Publication date: 01/01/1988
Field of study

Journal ArticleExisting multiprocessors and multicomputers require the programmer or compiler to perform data dependence analysis at compile time. We propose a parallel computer that performs this task at runtime. In particular, the Virtual Time Machine (VTM) detects violations of data dependence constraints as they occur, and automatically recovers from them. A sophisticated memory system that is addressed using both a spatial and a temporal coordinate is used to efficiently implement this mechanism. Initially targeted for discrete event simulation applications, many of the ideas used in the machine architecture have direct application in the more general realm of parallel computation. The long term goal of this work is to develop a general purpose parallel computer that will support a wide range of parallel programming paradigms. This paper outlines the motivations behind the V TM architecture, the underlying computation model, a proposed implementation, and initial performance results. A recurring theme that pervades the entire paper is our contention that existing shared memory and message-base machines do not pay adequate attention to the dimension of time. We argue that this architectural deficiency is the underlying reason behind many difficult problems in parallel computation today

The University of Utah: J. Willard Marriott Digital Library

Parallel Discrete Event Simulation with Erlang

Author: D'Angelo Gabriele
Marzolla Moreno
Toscano Luca
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Discrete Event Simulation (DES) is a widely used technique in which the state of the simulator is updated by events happening at discrete points in time (hence the name). DES is used to model and analyze many kinds of systems, including computer architectures, communication networks, street traffic, and others. Parallel and Distributed Simulation (PADS) aims at improving the efficiency of DES by partitioning the simulation model across multiple processing elements, in order to enabling larger and/or more detailed studies to be carried out. The interest on PADS is increasing since the widespread availability of multicore processors and affordable high performance computing clusters. However, designing parallel simulation models requires considerable expertise, the result being that PADS techniques are not as widespread as they could be. In this paper we describe ErlangTW, a parallel simulation middleware based on the Time Warp synchronization protocol. ErlangTW is entirely written in Erlang, a concurrent, functional programming language specifically targeted at building distributed systems. We argue that writing parallel simulation models in Erlang is considerably easier than using conventional programming languages. Moreover, ErlangTW allows simulation models to be executed either on single-core, multicore and distributed computing architectures. We describe the design and prototype implementation of ErlangTW, and report some preliminary performance results on multicore and distributed architectures using the well known PHOLD benchmark.Comment: Proceedings of ACM SIGPLAN Workshop on Functional High-Performance Computing (FHPC 2012) in conjunction with ICFP 2012. ISBN: 978-1-4503-1577-

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Lock-free Scheduling of Logical Processes in Parallel Simulation

Author: Liu Xiaowen
Nicol David M
Tan King, Dartmouth College
Publication venue: Dartmouth Digital Commons
Publication date: 01/01/2001
Field of study

With fixed lookahead information in a simulation model, the overhead of asynchronous conservative parallel simulation lies in the mechanism used for propagating time updates in order for logical processes to safely advance their local simulation clocks. Studies have shown that a good scheduling algorithm should preferentially schedule processes containing events on the critical path. This paper introduces a lock-free algorithm for scheduling logical processes in conservative parallel discrete-event simulation on shared-memory multiprocessor machines. The algorithm uses fetch\&add operations that help avoid inefficiencies associated with using locks. The lock-free algorithm is robust. Experiments show that, compared with the scheduling algorithm using locks, the lock-free algorithm exhibits better performance when the number of logical processes assigned to each processor is small or when the workload becomes significant. In models with large number of logical processes, our algorithm shows only modest increase in execution time due to the overhead in the algorithm for extra bookkeeping

Dartmouth Digital Commons (Dartmouth College)

Optimizing simulation on shared-memory platforms: The smart cities case

Author: Cingolani Davide
Ianni Mauro
Marotta Romolo
Pellegrini Alessandro
Quaglia Francesco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Modern advancements in computing architectures have been accompanied by new emergent paradigms to run Parallel Discrete Event Simulation models efficiently. Indeed, many new paradigms to effectively use the available underlying hardware have been proposed in the literature. Among these, the Share-Everything paradigm tackles massively-parallel shared-memory machines, in order to support speculative simulation by taking into account the limits and benefits related to this family of architectures. Previous results have shown how this paradigm outperforms traditional speculative strategies (such as data-separated Time Warp systems) whenever the granularity of executed events is small. In this paper, we show performance implications of this simulation-engine organization when the simulation models have a variable granularity. To this end, we have selected a traffic model, tailored for smart cities-oriented simulation. Our assessment illustrates the effects of the various tuning parameters related to the approach, opening to a higher understanding of this innovative paradigm

Crossref

ART

Archivio della ricerca- Università di Roma La Sapienza

A Non-Blocking Priority Queue for the Pending Event Set

Author: Ianni Mauro
Marotta Romolo
Pellegrini Alessandro
Quaglia Francesco
Publication venue: ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering)
Publication date: 01/01/2016
Field of study

The large diffusion of shared-memory multi-core machines has impacted the way Parallel Discrete Event Simulation (PDES) engines are built. While they were originally conceived as data-partitioned platforms, where each thread is in charge of managing a subset of simulation objects, nowadays the trend is to shift towards share-everything settings. In this scenario, any thread can (in principle) take care of CPU-dispatching pending events bound to whichever simulation object, which helps to fully share the load across the available CPU-cores. Hence, a fundamental aspect to be tackled is to provide an efficient globally-shared pending events’ set from which multiple worker threads can concurrently extract events to be processed, and into which they can concurrently insert new produced events to be processed in the future. To cope with this aspect, we present the design and implementation of a concurrent non-blocking pending events’ set data structure, which can be seen as a variant of a classical calendar queue. Early experimental data collected with a synthetic stress test are reported, showing excellent scalability of our proposal on a machine equipped with 32 CPU-cores

ART

Archivio della ricerca- Università di Roma La Sapienza