In this paper, we discuss the use of parallel discrete event simulation (PDES) algorithms for execution of hardware models written in VHDL. We survey central event queue, conservative distributed and optimistic distributed PDES algorithms, and discuss aspects of the semantics of VHDL and VHDL-92 that affect the use of these algorithms in a VHDL simulator. Next, we describe an experiment performed as part of the Vsim Project at the University of Adelaide, in which a simulation kernel using the central event queue algorithm was developed. We present measurements taken from this kernel simulating some benchmark models. It appears that this technique, which is relatively simple to implement, is suitable for use on small scale multiprocessors (such as current desktop multiprocessor workstations), simulating behavioral and register transfer level models. However, the degree of useful parallelism achievable on gate level models with this technique appears to be limited.
INTRODUCTION
ver the course of the last decade, VLSI circuits being manufactured have escalated in complexity. This is a direct result of advances in manufacturing technology. Simulation is now a vital tool used by VLSI engineers, both for verifying that a design meets its functional requirements, and for generating test vectors against which behavior of a manufactured circuit can be compared.
In order to simulate a design, the structure and intended behavior must be described. Structure can be described using circuit schematics, but behavior must be described using a hardware description language (HDL). An HDL is esssentially a specialized programming language, designed for this purpose. In 1987, the IEEE standardized a hardware description language for digital circuits and systems called VHDL [8] . Since There is a large body of research dealing with general techniques for managing parallel discrete event simulation (PDES). See [6] for a survey of the field.
These techniques model a system as a collection of logical processes, which interact by scheduling and reacting to events that occur at instants in simulation time. This framework is appropriate for simulating VHDL models, since such models consist of processes that represent the behavior of circuit components, and that schedule and react to changes of values of circuit signals.
The experiments being performed in the Vsim Project involve developing VHDL simulators using an appropriate selection of the published techniques, and determining which techniques give best speedup for different kinds of VHDL models. It is expected that VHDL simulation will be amenable to speedup using PDES algorithms, but the degree of parallelism implicit in real VHDL models is currently not clear, nor are the patterns of data dependencies relevant to this problem well understood.
As a first step, this paper identifies a number of PDES algorithms used for discrete event simulation in general, and evaluates their applicability to executing VHDL simulation models. In Section 2, we survey a number of PDES algorithms, then in Section 3 we discuss the characteristics of VHDL that are relevant when considering parallel simulation, and consider the applicability of the various PDES algorithms to execution of VHDL models. In Section 4, we describe an initial experiment performed using one of the PDES algorithms and present measurements made during execution of some benchmark models. Figure 7) .
SURVEY OF PDES ALGORITHMS
The logical process acts as a local kernel for the VHDL process, filtering input messages, and resuming the VHDL process when the wait statement on which it is suspended terminates. The VHDL process uses the services of the local kernel to send output messages as described throughout the rest of this paper.
In addition to the processes described above that define the behavior of a model, VHDL also allows the designer to specify a block of code for resolving the value of a signal driven by a number of sources. Figure 8 . Indeed, zero delay behavioral models, and some low level models for switched circuit elements, require these semantics to be observed by a simulator.
The potential problems that arise lead simulator implementors to treat simulation time as a pair, consisting of a time value in fs, and a count of delta cycles at that time step. This presents no great difficulty in a sequential simulator, or in a parallel simulator using the If a conservative algorithm is used, a postponed process may not respond to a transaction until it can guarantee that no other transaction at the same timestep will arrive. Figure 10 illustrates a scenario where the normal conservative algorithm would fail for a postponed process.
A possible variation to the conservative algorithm of Figure 9 for use by a postponed process is shown in Figure 11 . In this case, the inner [15] , in which a fixed statically determined delay is specified for each process, and simple transport delay semantics are used. This means that each signal assignment schedules a transaction on a signal for a later simulation time than transactions from previously executed assignments, and no transactions need be deleted.
For parallel simulators using conservative algorithms, the prime concern in this context is that transactions be sent over message links in non-decreasing time-stamp order. If transport delay is used, the simulator needs to determine a lower bound on the delay value for any assignments to a signal in a process (6min), since transactions later than the current simulation time (T) [16] ), and thus guarantee that transactions won't be deleted. In these cases, a transaction may be sent before the local clock reaches its time-stamp, thus increasing the parallelism achievable. This is an example of the use of lookahead in a conservative algorithm. The literature on conservative algorithms suggests that the quality of lookahead information used is important in improving the performance of these algorithms (see [6] ).
The proposal for VHDL-92 to include a pulse rejection limit in an inertial delay specification provides opportunity to improve on the above behavior. The pulse rejection limit is a time interval, counting backwards from the delay value specified in a signal assignment, in which previously scheduled transactions may be deleted. Any transactions between the current time and the beginning of this interval are not deleted. Thus if the simulator can determine a lower bound on delays (6m,) and an upper bound on pulse rejection limits (Pm,x), transactions with time-stamps between T and T / (min Omax)can be sent. However, the difficulty mentioned above also applies to determining Pmax, namely that, in general, the interval is determined at run time.
While considerable complexity is added to conservative algorithms in handling these semantics, time warp algorithms can deal with transaction deletion much more simply. When a transaction is scheduled in a time warp simulator, the process can send it immediately. If the transaction must subsequently be deleted, the process can simply send an antimessage to cause cancellation of the transaction and roll back of any optimistically executed computation. Note, however, that this does not prevent application of lookahead techniques to avoid sending a transaction. If the simulator can determine that a transaction will be deleted, it can avoid the cost of sending and subsequently cancelling the transaction.
Shared Variables
The proposal for shared variables in creates considerable extra complication for distributed PDES algorithms. At the original time of writing of this paper, the proposal included mechanisms for synchronizing processes access to shared variables, allowing a designer to manage or exclude nondeterminism in a model. However, since then, the IEEE ballotting body has voted to remove these mechanisms from the formal language definition, allowing uncontrolled access to shared variables. This step has been the cause of much contention, and it is unclear at this stage whether the originally proposed synchronization mechanisms will be adopted as an informal adiuct to the language. We have not yet addressed the issue of handling uncontrolled shared variables in a parallel VHDL simulator, as it is unclear what the expected semantics should be. The following discussion addresses inclusion of controlled shared variables, as described in the original proposal.
The proposed semantics are that shared variables may be declared in blocks that enclose processes, and the processes may refer to shared variables using access statements. Each access statement names a set of shared variables, and the runtime system is required to guarantee exclusive access to these variables while the sequential statements in the access statement are executed. The mechanism used by the runtime system to provide mutual exclusion between processes is required to be deadlock free. This is made simpler by the act that processes may not suspend whilst within an access statement, nor may they dynamically nest access statements.
In a sequential simulator, or a parallel simulator using a central event queue algorithm, shared variables can be managed by the kernel. It can apply any of the well known resource allocation techniques used by operating systems and described in texts on the subject (for example, [14] In a simulator using a conservative algorithm, the monitor process treats the request message channels as input links, complete with link clocks, and applies the conservative approach to serialize requests correctly. On the other hand, in a simulator using an optimistic algorithm, a monitor process simply queues requests and grants exclusive access provided requests arrive in non-decreasing time-stamp order. When a straggler request arrives, it rolls the value of the shared variable back, and sends anti-messages for the grant messages it has since sent to other processes. These, in turn, must roll back and re-execute the access statements. 
A CENTRAL QUEUE PDES IMPLEMENTATION

Results
The results shown here are derived from two benchmark models. The first is a register transfer level model of the non-pipelined DLX processor, described in [7] . This 
