2,319 research outputs found

    A program-driven parallel machine simulation environment

    Get PDF
    [[abstract]]In recent years, it has been very popular to employ discrete-event simulation as a hardware architecture analytical tool to study distributed-memory multicomputers and shared-memory multiprocessors. After the hardware architecture prototype has been completed, a complete and detailed machine simulation environment can be utilized to evaluate the architecture's efficiency under real operating systems and application software. In this article, we discuss all the development and implementation of a program-executable Transputer network multicomputer as well as 80x86 series multiprocessors, and how they can be operated. On another level, owing to the extreme complexity of the simulated computer systems, parallel discrete-event simulation has also been used to shorten the time of running the simulation. In practice, this simulator can solve problems through a network connection with many workstations. Some of the workstations may be in charge of computing, while others can be responsible for the management of memory, thus making it simpler to establish a parallel machine simulation environment. In addition to providing an environment for programs to execute on it, such a simulator also calculates the time spent in running these programs, so as to evaluate the feasibility for these application programs to run on a hardware system.[[conferencetype]]國際[[conferencedate]]19981214~19981216[[conferencelocation]]Tainan, Taiwa

    CYCLIC: A Locality-Preserving Load-Balancing Algorithm for PDES on Shared Memory Multiprocessors

    Get PDF
    This paper presents a new load-balancing algorithm for shared memory multiprocessors that is currently being applied to the parallel simulation of logic circuits, specifically VHDL simulations. The main idea of this load-balancing algorithm is based on the exploitation of the usual characteristics of these simulations, that is, cyclicity and predictability, to obtain a good load balance while preserving the locality of references. This algorithm is useful not only in the area of logic circuit simulation but also in systems presenting a cyclic execution pattern, that is, repetition over time, making the future behavior of the tasks predictable. An example of this is Parallel Discrete Event Simulation (PDES), where several tasks are repeatedly executed in response to certain events. A comparison between the proposed algorithm and other load-balancing algorithms found in the literature reveals consistently better execution times with improvements in both load-balancing and locality of references that can be of help on current multicore desktop computers

    Optimising Simulation Data Structures for the Xeon Phi

    Get PDF
    In this paper, we propose a lock-free architecture to accelerate logic gate circuit simulation using SIMD multi-core machines. We evaluate its performance on different test circuits simulated on the Intel Xeon Phi and 2 other machines. Comparisons are presented of this software/hardware combination with reported performances of GPU and other multi-core simulation platforms. Comparisons are also given between the lock free architecture and a leading commercial simulator running on the same Intel hardware

    Wait-Free Global Virtual Time Computation in Shared Memory Time-Warp Systems

    Get PDF
    Global Virtual Time (GVT) is a powerful abstraction used to discriminate what events belong (and what do not belong) to the past history of a parallel/distributed computation. For high performance simulation systems based on the Time Warp synchronization protocol, where concurrent simulation objects are allowed to process their events speculatively and causal consistency is achieved via rollback/recovery techniques, GVT is used to determine which portion of the simulation can be considered as committed. Hence it is the base for actuating memory recovery (e.g. of obsolete logs that were taken in order to support state recoverability) and nonrevocable operations (e.g. I/O). For shared memory implementations of simulation platforms based on the Time Warp protocol, the reference GVT algorithm is the one presented by Fujimoto and Hybinette [1]. However, this algorithm relies on critical sections that make it non-wait-free, and which can hamper scalability. In this article we present a waitfree shared memory GVT algorithm that requires no critical section. Rather, correct coordination across the processes while computing the GVT value is achieved via memory atomic operations, namely compare-and-swap. The price paid by our proposal is an increase in the number of GVT computation phases, as opposed to the single phase required by the proposal in [1]. However, as we show via the results of an experimental study, the wait-free nature of the phases carried out in our GVT algorithm pays-off in reducing the actual cost incurred by the proposal in [1]

    Simulation of 1+1 dimensional surface growth and lattices gases using GPUs

    Get PDF
    Restricted solid on solid surface growth models can be mapped onto binary lattice gases. We show that efficient simulation algorithms can be realized on GPUs either by CUDA or by OpenCL programming. We consider a deposition/evaporation model following Kardar-Parisi-Zhang growth in 1+1 dimensions related to the Asymmetric Simple Exclusion Process and show that for sizes, that fit into the shared memory of GPUs one can achieve the maximum parallelization speedup ~ x100 for a Quadro FX 5800 graphics card with respect to a single CPU of 2.67 GHz). This permits us to study the effect of quenched columnar disorder, requiring extremely long simulation times. We compare the CUDA realization with an OpenCL implementation designed for processor clusters via MPI. A two-lane traffic model with randomized turning points is also realized and the dynamical behavior has been investigated.Comment: 20 pages 12 figures, 1 table, to appear in Comp. Phys. Com

    Fat vs. thin threading approach on GPUs: application to stochastic simulation of chemical reactions

    Get PDF
    We explore two different threading approaches on a graphics processing unit (GPU) exploiting two different characteristics of the current GPU architecture. The fat thread approach tries to minimise data access time by relying on shared memory and registers potentially sacrificing parallelism. The thin thread approach maximises parallelism and tries to hide access latencies. We apply these two approaches to the parallel stochastic simulation of chemical reaction systems using the stochastic simulation algorithm (SSA) by Gillespie (J. Phys. Chem, Vol. 81, p. 2340-2361, 1977). In these cases, the proposed thin thread approach shows comparable performance while eliminating the limitation of the reaction system’s size

    Reconfigurable interconnects in DSM systems: a focus on context switch behavior

    Get PDF
    Recent advances in the development of reconfigurable optical interconnect technologies allow for the fabrication of low cost and run-time adaptable interconnects in large distributed shared-memory (DSM) multiprocessor machines. This can allow the use of adaptable interconnection networks that alleviate the huge bottleneck present due to the gap between the processing speed and the memory access time over the network. In this paper we have studied the scheduling of tasks by the kernel of the operating system (OS) and its influence on communication between the processing nodes of the system, focusing on the traffic generated just after a context switch. We aim to use these results as a basis to propose a potential reconfiguration of the network that could provide a significant speedup

    Simulation models of shared-memory multiprocessor systems

    Get PDF
    • …
    corecore