45,045 research outputs found
Going through Rough Times: from Non-Equilibrium Surface Growth to Algorithmic Scalability
Efficient and faithful parallel simulation of large asynchronous systems is a
challenging computational problem. It requires using the concept of local
simulated times and a synchronization scheme. We study the scalability of
massively parallel algorithms for discrete-event simulations which employ
conservative synchronization to enforce causality. We do this by looking at the
simulated time horizon as a complex evolving system, and we identify its
universal characteristics. We find that the time horizon for the conservative
parallel discrete-event simulation scheme exhibits Kardar-Parisi-Zhang-like
kinetic roughening. This implies that the algorithm is asymptotically scalable
in the sense that the average progress rate of the simulation approaches a
non-zero constant. It also implies, however, that there are diverging memory
requirements associated with such schemes.Comment: to appear in the Proceedings of the MRS, Fall 200
Suppressing Roughness of Virtual Times in Parallel Discrete-Event Simulations
In a parallel discrete-event simulation (PDES) scheme, tasks are distributed
among processing elements (PEs), whose progress is controlled by a
synchronization scheme. For lattice systems with short-range interactions, the
progress of the conservative PDES scheme is governed by the Kardar-Parisi-Zhang
equation from the theory of non-equilibrium surface growth. Although the
simulated (virtual) times of the PEs progress at a nonzero rate, their standard
deviation (spread) diverges with the number of PEs, hindering efficient data
collection. We show that weak random interactions among the PEs can make this
spread nondivergent. The PEs then progress at a nonzero, near-uniform rate
without requiring global synchronizations
Conservative parallel simulation of priority class queueing networks
A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks
A conservative approach to parallelizing the Sharks World simulation
Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes
Update statistics in conservative parallel discrete event simulations of asynchronous systems
We model the performance of an ideal closed chain of L processing elements
that work in parallel in an asynchronous manner. Their state updates follow a
generic conservative algorithm. The conservative update rule determines the
growth of a virtual time surface. The physics of this growth is reflected in
the utilization (the fraction of working processors) and in the interface
width. We show that it is possible to nake an explicit connection between the
utilization and the macroscopic structure of the virtual time interface. We
exploit this connection to derive the theoretical probability distribution of
updates in the system within an approximate model. It follows that the
theoretical lower bound for the computational speed-up is s=(L+1)/4 for L>3.
Our approach uses simple statistics to count distinct surface configuration
classes consistent with the model growth rule. It enables one to compute
analytically microscopic properties of an interface, which are unavailable by
continuum methods.Comment: 15 pages, 12 figure
Parallelization of a Dynamic Monte Carlo Algorithm: a Partially Rejection-Free Conservative Approach
We experiment with a massively parallel implementation of an algorithm for
simulating the dynamics of metastable decay in kinetic Ising models. The
parallel scheme is directly applicable to a wide range of stochastic cellular
automata where the discrete events (updates) are Poisson arrivals. For high
performance, we utilize a continuous-time, asynchronous parallel version of the
n-fold way rejection-free algorithm. Each processing element carries an lxl
block of spins, and we employ the fast SHMEM-library routines on the Cray T3E
distributed-memory parallel architecture. Different processing elements have
different local simulated times. To ensure causality, the algorithm handles the
asynchrony in a conservative fashion. Despite relatively low utilization and an
intricate relationship between the average time increment and the size of the
spin blocks, we find that for sufficiently large l the algorithm outperforms
its corresponding parallel Metropolis (non-rejection-free) counterpart. As an
example application, we present results for metastable decay in a model
ferromagnetic or ferroelectric film, observed with a probe of area smaller than
the total system.Comment: 17 pages, 7 figures, RevTex; submitted to the Journal of
Computational Physic
Synchronization Landscapes in Small-World-Connected Computer Networks
Motivated by a synchronization problem in distributed computing we studied a
simple growth model on regular and small-world networks, embedded in one and
two-dimensions. We find that the synchronization landscape (corresponding to
the progress of the individual processors) exhibits Kardar-Parisi-Zhang-like
kinetic roughening on regular networks with short-range communication links.
Although the processors, on average, progress at a nonzero rate, their spread
(the width of the synchronization landscape) diverges with the number of nodes
(desynchronized state) hindering efficient data management. When random
communication links are added on top of the one and two-dimensional regular
networks (resulting in a small-world network), large fluctuations in the
synchronization landscape are suppressed and the width approaches a finite
value in the large system-size limit (synchronized state). In the resulting
synchronization scheme, the processors make close-to-uniform progress with a
nonzero rate without global intervention. We obtain our results by ``simulating
the simulations", based on the exact algorithmic rules, supported by
coarse-grained arguments.Comment: 20 pages, 22 figure
- …