Abstract-Logic simulation is a crucial verification task in processor design. Aiming at significant acceleration of system simulation we have parallelized IBM's cycle-based simulator TEXSIM. Resulting parallelTEXSIM has already been employed successfully i n simulating S/390 architectures o n IBM SP systems. Here we present parallelTEXSIM together with its model partitioning environment.
I. INTRODUCTION
Verification of processor designs is a real challenge due to rapidly growing design complexity. Ensuring correctness of logic designs containing millions of gates requires a large amount of simulation. For system simulation at register transfer level (RTL) and gate level (GL) it has proven to be a good practice to separate timing analysis from functional verification (logic simulation). The application of static timing verifiers and cycle-based simulators as tools targeted to the corresponding task is advantage [9] , Considering functional verification, cycle-based simulators are significantly faster than event-driven ones which offer rich functionality at the cost of performance and memory utilization (81. Hardware accelerators as alternatives to software simulators show high performance but they are very expensive and difficult to adapt to changing simulation techniques.
Cycle-based simulation (CBS) can be abstractly represented by the ALTER-CLOCK-RETRIEVE pattern (Fig. l) Our parallelization approach makes use of model inherent parallelism (replicated worker principle). A parallellEXSlM cycle simulation appears as a co-operation of several simulator instances evaluating parts of the original model, which are defined statically, on a loosely-coupled processor system. Choosing cone-based model partitioning gives the possibility of leaving the kernel of the optimized TEXSIM simulation engine unchanged. The performance of corresponding parallel simulations strongly depends on preceding model partitioning. Because of the relevance of one Partitioning for several extremely time consuming simulations over one and the same model we allow complex partitioning algorithms regarding both component communication and workload aspects.
In Section 11, our static model partitioning strategy for paraIlel cycle-based simulation is outlined. Then, parallelTEXSlM is introduced in Section I11 focussing on its general structure and performance relevant issues. Experimental results related to parallel simulations of a real processor model belonging to the IBM S/390 architecture are presented in Section IV.
MODEL PARTITIONING

A. Overview
For paralielTEXSIM simulations a set of TEXSIM models representing parts of an original design has to be provided together with a cross-reference list and one signal-cut list per model. The crossreference list contains all elemental design components which are accessible from outside during simulation and information about their distribution to models. Signal-cut lists comprise modelrelated input and output points indicating communication relations to other models. In Fig. 2 the process of model-related input generation for parallelTEXSlM is depicted with protos embodying DA-DB objects for the structural representation of RTL / GL designs.
Fig. 2 -Model-related parallelTEXSIM input
Partitioning starts from a proto generated by a HDL compilation. The number of resulting protos can be both given in advance or fixed by the partitioning process itself. For every proto delivered a model building process realized by texbld (the same as for sequential TEXSIM simulation) follows. The parallel simulator automatically adapts to the number of models generated.
B. Partitioning Strategy
As basis for development and analysis of proto partitioning algorithms we have introduced a formal Structural Hardware Model (SHM) [5] embodying a directed graph with vertices to be interpreted as wires, latches, elements of combinatorial logic or input/output elements, respectively. For each proto describing a synchronous design a corresponding SHM can be constructed easily. Then, the problem of proto partitioning c m be considered as a problem of SHM partitioning. For a given SHM M , the fan-in cones with heads representing latches or outputs form the set C o ( M ) of basic building blocks for partitions of M . Due to the complexity of models representing complete processor structures we have introduced a hierarchical partitioning strategy [5] allowing combination and competition of partitioning algorithms and a special merging method of their results called superposition. Within a two-level partitioning scheme after fast pre-partitioning (reducing problem complexity) more expensive algorithms working on the basis of hypergraph structures are applied. Specifically, the employment of Evolutionary Algorithms at the second level has proven successful.
C. Partitioning Tools
A realization of the two-level partitioning scheme mentioned above is given by PAMB (Partitioning And Model Build) which has been developed by R.HAUPT. It comprises a C-library (containing partitioning algorithms, functions for proto handling, model build and the creation of hypergraph structures) together with a scriptbased application framework in the context of the DA-DB. PAMB will be integrated in our partitioning environment parallelMAP (Model Analysis and Partitioning). The latter one embodies a client-server architecture which allows the implementation of parallel partitioning algorithms on messagepassing basis. nication via message-passing. A master simulator instance (mTEXSIM) derived from sequential TEXSIM provides Application Programming Interfaces (APIs) to the environment and controls a set of slave simulator instances (sTEXS1M). These instances contain the original TEXSIM simulation engine encapsulated within a communication shell. parallelTEXSlM permits parallelization of user programs for simulation control by assigning corresponding program instances to simulator instances. The software platform currently s u p ported is IBM AIX/6000 with the IBM Parallel Environment [6]. This allows networks of IBM RS/6000 workstations and IBM RS/6000 SP machines as target hardware.
[T" [ -) [ -I 
B. Facility Management
THE PARALLEL SIMULATOR
A. General Structure
We have implemented parallelTEXSlM based on the partitioning framework outlined above. A production release of the simulator is employed for regression runs in the IBM S/390 processor development allowing significantly larger test cases. From the user's point of view, the parallel program offers the same options and interfaces as the sequential one. Designed to run on loosely-coupled systems, it represents a masterslave structure (Fig. 3) with component commuFor referencing model elements (signals or arrays) from outside during simulation, TEXSIM provides the concept of ,facilitzes which are represented by bit matrices. Model partitioning for parallel simulation leads to distribution of accessible elements of the original model to Merent models (described by a cross-reference list mentioned in Section 11) possibly related with replication and splitting. Therefore the facility notion has been extended to a facility hierarchy to ensure efficient element referencing within parallelTEXSl M .
We have introduced global, parallel and local facilities. A local facility is a usual facility handled by a slave simulator instance. The global facilities serve to support the usual facilities on a master simulator instance as if they were referenced on a non-parallel simulator. In fact they are vectors of parallel facilities and those are vectors containing references to local facilities.
Moreover, we have introduced communication facilities to achieve fast access to data structures which are involved in collective communication o p erations during parallel cycle simulation. These data structures are related to cut signals which arise from model partitioning. Cut signals are made known to parallelTEXSlM via model-related signal-cut lists (cf. Section 11). Within parallelTEXSlM three types of collective message-passing are distinguished based on the components involved "master and one slave", "master and all slaves" and "all slaves". Processes using communication operations of the corresponding type are for instance facility referencing, creating an initial status protocol and simulating cycles in parallel. 
V. CONCLUDING REMARKS
We have presented parallelTEXSIM, a parallel logic simulator running on loosely-coupled systems. It essentially accelerates time consuming system simulation processes in the IBM S/390 processor development. Simulation performance strongly depends on preceding model partitioning. Our partitioning environment allows consideration of workload and inter-processor communication aspects via parameterized partition valuation functions. In future work we will use our experiences with parallelTEXSlM for the parallelization of the IBM trate on problems related to multiply referenced sub-designs, dynamic load balancing in multiuser environments and parallelization of user programs controlling simulation. 
