Search CORE

2,319 research outputs found

A program-driven parallel machine simulation environment

Author: Chou Chien-chun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

[[abstract]]In recent years, it has been very popular to employ discrete-event simulation as a hardware architecture analytical tool to study distributed-memory multicomputers and shared-memory multiprocessors. After the hardware architecture prototype has been completed, a complete and detailed machine simulation environment can be utilized to evaluate the architecture's efficiency under real operating systems and application software. In this article, we discuss all the development and implementation of a program-executable Transputer network multicomputer as well as 80x86 series multiprocessors, and how they can be operated. On another level, owing to the extreme complexity of the simulated computer systems, parallel discrete-event simulation has also been used to shorten the time of running the simulation. In practice, this simulator can solve problems through a network connection with many workstations. Some of the workstations may be in charge of computing, while others can be responsible for the management of memory, thus making it simpler to establish a parallel machine simulation environment. In addition to providing an environment for programs to execute on it, such a simulator also calculates the time spent in running these programs, so as to evaluate the feasibility for these application programs to run on a hardware system.[[conferencetype]]國際[[conferencedate]]19981214~19981216[[conferencelocation]]Tainan, Taiwa

Tamkang University Institutional Repository

CYCLIC: A Locality-Preserving Load-Balancing Algorithm for PDES on Shared Memory Multiprocessors

Author: García Maria Isabel
García-Dopico Antonio
Pérez Antonio
Rodríguez Santiago
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 24/01/2013
Field of study

This paper presents a new load-balancing algorithm for shared memory multiprocessors that is currently being applied to the parallel simulation of logic circuits, specifically VHDL simulations. The main idea of this load-balancing algorithm is based on the exploitation of the usual characteristics of these simulations, that is, cyclicity and predictability, to obtain a good load balance while preserving the locality of references. This algorithm is useful not only in the area of logic circuit simulation but also in systems presenting a cyclic execution pattern, that is, repetition over time, making the future behavior of the tasks predictable. An example of this is Parallel Discrete Event Simulation (PDES), where several tasks are repeatedly executed in response to certain events. A comparison between the proposed algorithm and other load-balancing algorithms found in the literature reveals consistently better execution times with improvements in both load-balancing and locality of references that can be of help on current multicore desktop computers

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

Optimising Simulation Data Structures for the Xeon Phi

Author: Chimeh Mozhgan K.
Cockshott Paul
Publication venue
Publication date: 01/01/2016
Field of study

In this paper, we propose a lock-free architecture to accelerate logic gate circuit simulation using SIMD multi-core machines. We evaluate its performance on different test circuits simulated on the Intel Xeon Phi and 2 other machines. Comparisons are presented of this software/hardware combination with reported performances of GPU and other multi-core simulation platforms. Comparisons are also given between the lock free architecture and a leading commercial simulator running on the same Intel hardware

Enlighten

Wait-Free Global Virtual Time Computation in Shared Memory Time-Warp Systems

Author: PELLEGRINI ALESSANDRO
QUAGLIA Francesco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Global Virtual Time (GVT) is a powerful abstraction used to discriminate what events belong (and what do not belong) to the past history of a parallel/distributed computation. For high performance simulation systems based on the Time Warp synchronization protocol, where concurrent simulation objects are allowed to process their events speculatively and causal consistency is achieved via rollback/recovery techniques, GVT is used to determine which portion of the simulation can be considered as committed. Hence it is the base for actuating memory recovery (e.g. of obsolete logs that were taken in order to support state recoverability) and nonrevocable operations (e.g. I/O). For shared memory implementations of simulation platforms based on the Time Warp protocol, the reference GVT algorithm is the one presented by Fujimoto and Hybinette [1]. However, this algorithm relies on critical sections that make it non-wait-free, and which can hamper scalability. In this article we present a waitfree shared memory GVT algorithm that requires no critical section. Rather, correct coordination across the processes while computing the GVT value is achieved via memory atomic operations, namely compare-and-swap. The price paid by our proposal is an increase in the number of GVT computation phases, as opposed to the single phase required by the proposal in [1]. However, as we show via the results of an experimental study, the wait-free nature of the phases carried out in our GVT algorithm pays-off in reducing the actual cost incurred by the proposal in [1]

ART

Archivio della ricerca- Università di Roma La Sapienza

Simulation of 1+1 dimensional surface growth and lattices gases using GPUs

Author: Amir
Barabási
Barma
Bernaschi
Castro
Chowdhury
Chowdhury
Edwards
Facsko
Family
Forster
Gergely Ódor
Gross
Géza Ódor
Halpin-Healy
Harris
Henrik Schulz
Hinrichsen
Hwa
Janssen
Juhász
Juhász
Kardar
Kardar
Katz
Klumpp
Krug
Krug
Ligget
Meakin
Máté Ferenc Nagy
Plischke
Preis
Prähofer
Sasamoto
Sasamoto
Schittmann
Schliwa
van Beijeren
van Meel
Weigel
Ódor
Ódor
Ódor
Ódor
Ódor
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Restricted solid on solid surface growth models can be mapped onto binary lattice gases. We show that efficient simulation algorithms can be realized on GPUs either by CUDA or by OpenCL programming. We consider a deposition/evaporation model following Kardar-Parisi-Zhang growth in 1+1 dimensions related to the Asymmetric Simple Exclusion Process and show that for sizes, that fit into the shared memory of GPUs one can achieve the maximum parallelization speedup ~ x100 for a Quadro FX 5800 graphics card with respect to a single CPU of 2.67 GHz). This permits us to study the effect of quenched columnar disorder, requiring extremely long simulation times. We compare the CUDA realization with an OpenCL implementation designed for processor clusters via MPI. A two-lane traffic model with randomized turning points is also realized and the dynamical behavior has been investigated.Comment: 20 pages 12 figures, 1 table, to appear in Comp. Phys. Com

arXiv.org e-Print Archive

Crossref

ELTE Digital Institutional Repository (EDIT)

Fat vs. thin threading approach on GPUs: application to stochastic simulation of chemical reactions

Author: Erban R.
Giles M. B.
Klingbeil G.
Maini P. K.
Publication venue
Publication date: 01/01/2010
Field of study

We explore two different threading approaches on a graphics processing unit (GPU) exploiting two different characteristics of the current GPU architecture. The fat thread approach tries to minimise data access time by relying on shared memory and registers potentially sacrificing parallelism. The thin thread approach maximises parallelism and tries to hide access latencies. We apply these two approaches to the parallel stochastic simulation of chemical reaction systems using the stochastic simulation algorithm (SSA) by Gillespie (J. Phys. Chem, Vol. 81, p. 2340-2361, 1977). In these cases, the proposed thin thread approach shows comparable performance while eliminating the limitation of the reaction system’s size

Oxford University Research Archive

A Dynamically Configurable Discrete Event Simulation Framework for Many-Core Chip Multiprocessors

Author: Christopher Barnes
Jaehwan Lee
Publication venue: 'IntechOpen'
Publication date: 18/08/2010
Field of study

IntechOpen

Crossref

Reconfigurable interconnects in DSM systems: a focus on context switch behavior

Author: Artundo I
Dambre Joni
Debaes C
Heirman Wim
Manjarres D
Thienpont H
Van Campenhout Jan
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2006
Field of study

Recent advances in the development of reconfigurable optical interconnect technologies allow for the fabrication of low cost and run-time adaptable interconnects in large distributed shared-memory (DSM) multiprocessor machines. This can allow the use of adaptable interconnection networks that alleviate the huge bottleneck present due to the gap between the processing speed and the memory access time over the network. In this paper we have studied the scheduling of tasks by the kernel of the operating system (OS) and its influence on communication between the processing nodes of the system, focusing on the traffic generated just after a context switch. We aim to use these results as a basis to propose a potential reconfiguration of the network that could provide a significant speedup

Crossref

Ghent University Academic Bibliography

Simulation models of shared-memory multiprocessor systems

Author: Coe Paul.
Publication venue: The University of Edinburgh
Publication date: 01/01/2000
Field of study

Edinburgh Research Archive