45,045 research outputs found

    Going through Rough Times: from Non-Equilibrium Surface Growth to Algorithmic Scalability

    Full text link
    Efficient and faithful parallel simulation of large asynchronous systems is a challenging computational problem. It requires using the concept of local simulated times and a synchronization scheme. We study the scalability of massively parallel algorithms for discrete-event simulations which employ conservative synchronization to enforce causality. We do this by looking at the simulated time horizon as a complex evolving system, and we identify its universal characteristics. We find that the time horizon for the conservative parallel discrete-event simulation scheme exhibits Kardar-Parisi-Zhang-like kinetic roughening. This implies that the algorithm is asymptotically scalable in the sense that the average progress rate of the simulation approaches a non-zero constant. It also implies, however, that there are diverging memory requirements associated with such schemes.Comment: to appear in the Proceedings of the MRS, Fall 200

    Suppressing Roughness of Virtual Times in Parallel Discrete-Event Simulations

    Full text link
    In a parallel discrete-event simulation (PDES) scheme, tasks are distributed among processing elements (PEs), whose progress is controlled by a synchronization scheme. For lattice systems with short-range interactions, the progress of the conservative PDES scheme is governed by the Kardar-Parisi-Zhang equation from the theory of non-equilibrium surface growth. Although the simulated (virtual) times of the PEs progress at a nonzero rate, their standard deviation (spread) diverges with the number of PEs, hindering efficient data collection. We show that weak random interactions among the PEs can make this spread nondivergent. The PEs then progress at a nonzero, near-uniform rate without requiring global synchronizations

    Conservative parallel simulation of priority class queueing networks

    Get PDF
    A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks

    A conservative approach to parallelizing the Sharks World simulation

    Get PDF
    Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes

    Update statistics in conservative parallel discrete event simulations of asynchronous systems

    Full text link
    We model the performance of an ideal closed chain of L processing elements that work in parallel in an asynchronous manner. Their state updates follow a generic conservative algorithm. The conservative update rule determines the growth of a virtual time surface. The physics of this growth is reflected in the utilization (the fraction of working processors) and in the interface width. We show that it is possible to nake an explicit connection between the utilization and the macroscopic structure of the virtual time interface. We exploit this connection to derive the theoretical probability distribution of updates in the system within an approximate model. It follows that the theoretical lower bound for the computational speed-up is s=(L+1)/4 for L>3. Our approach uses simple statistics to count distinct surface configuration classes consistent with the model growth rule. It enables one to compute analytically microscopic properties of an interface, which are unavailable by continuum methods.Comment: 15 pages, 12 figure

    Parallelization of a Dynamic Monte Carlo Algorithm: a Partially Rejection-Free Conservative Approach

    Full text link
    We experiment with a massively parallel implementation of an algorithm for simulating the dynamics of metastable decay in kinetic Ising models. The parallel scheme is directly applicable to a wide range of stochastic cellular automata where the discrete events (updates) are Poisson arrivals. For high performance, we utilize a continuous-time, asynchronous parallel version of the n-fold way rejection-free algorithm. Each processing element carries an lxl block of spins, and we employ the fast SHMEM-library routines on the Cray T3E distributed-memory parallel architecture. Different processing elements have different local simulated times. To ensure causality, the algorithm handles the asynchrony in a conservative fashion. Despite relatively low utilization and an intricate relationship between the average time increment and the size of the spin blocks, we find that for sufficiently large l the algorithm outperforms its corresponding parallel Metropolis (non-rejection-free) counterpart. As an example application, we present results for metastable decay in a model ferromagnetic or ferroelectric film, observed with a probe of area smaller than the total system.Comment: 17 pages, 7 figures, RevTex; submitted to the Journal of Computational Physic

    Synchronization Landscapes in Small-World-Connected Computer Networks

    Full text link
    Motivated by a synchronization problem in distributed computing we studied a simple growth model on regular and small-world networks, embedded in one and two-dimensions. We find that the synchronization landscape (corresponding to the progress of the individual processors) exhibits Kardar-Parisi-Zhang-like kinetic roughening on regular networks with short-range communication links. Although the processors, on average, progress at a nonzero rate, their spread (the width of the synchronization landscape) diverges with the number of nodes (desynchronized state) hindering efficient data management. When random communication links are added on top of the one and two-dimensional regular networks (resulting in a small-world network), large fluctuations in the synchronization landscape are suppressed and the width approaches a finite value in the large system-size limit (synchronized state). In the resulting synchronization scheme, the processors make close-to-uniform progress with a nonzero rate without global intervention. We obtain our results by ``simulating the simulations", based on the exact algorithmic rules, supported by coarse-grained arguments.Comment: 20 pages, 22 figure
    corecore