This paper investigates the possibility of using Field-Programmable Gate Arrays (Fr'GAS) as reconfigurable co-processors for workstations to produce moderate speedups for most tasks in the design process, resulting in a worthwhile overall design process speedup at low cost and allowing algorithm upgrades with no hardware modification. The use of FPGAS as hardware accelerators is reviewed and then achievable speedups are predicted for logic simulation and VLSI design rule checking tasks for various FPGA co-processor arrangements.
INTRODUCTION
Many special hardware acceleration engines have been proposed or built ], with significant speedups over software. However, the VLSI design process is characterized by many computer-intensive stages that are quite different in nature. The use of an accelerator for one stage does not dramatically improve the productivity of the overall design process and those accelerators in use serve to allow substantially more logic and fault simulation rather than by cutting design times. The continual development of algorithms also counts against hardwired accelerators. Virtually all design automation tasks are still performed solely on general purpose computers, particularly workstations.
The development of static RAM based FieldProgrammable Gate Arrays means that reconfigurable co-processors can now be realized that could accommodate the diversity and dynamic nature of design automation tasks. The speedup achieved for any one task will be less than that for a complex dedicated hardware accelerator but, applied to speed up tasks throughout the design process, it may provide a substantial benefit for a very low hardware cost. Also, algorithms could be upgraded without hardware modification. This paper investigates this possibility.
The [3] [4] [5] [6] [7] , each with many FPGAs connected to banks of memory, with speedups of 2 to 3 orders of magnitude over software quoted. However, the programming of even trivial tasks on these machines is complex and the development of support tools is an important ongoing research subject. Eventually, the aim is to have a high-level description which can be compiled to produce an object code for the host and configuration files for the FPGA. This concept has been demonstrated by Athanas Autonomous: The FPGAS implements a customizable processor which can take over from the master to execute whole sections of processing-intensive code. Instructions for this processor are stored in main memory but the customized instructions will be complex and typically run for many memory cycles.
considered for the various co-processor arrangements with calculated execution times compared to that for software with no co-processor.
Design Rule Checking
The case considered here employs the 'scan-line' technique [9] in which the polygons are pre-sorted into 'bins' depending on their x-coordinate range and, within each bin, polygons are pre-sorted into ascending order of their lower y-coordinate. Polygons may be angled at 45 In order to reduce the number of comparisons of polygons to check minimum distances etc., the algorithm scans through bins vertically upwards with a bar (scan-line) of height Ymin" At each position of the scan-line, the algorithm maintains a list of those polygons which overlap with the scan-line, deleting and appending polygons as the scan-line shifts. Only those polygons within the scan-line are compared with one another for checking a minimum distance of Ymin" Primitive operations required include testing whether two polygons overlap, abutt, are electrically connected, or are separated by a required minimum distance. The operation considered here for timing comparison is the stepping through a scan-line list to detect polygons that overlap with a reference polygon. For the hardware accelerated case, polygon records are packed into two 32-bit words. A reference polygon is loaded into the FPGA first and then, for each polygon in the scan-line list, the record is loaded and a flag indicating the 'overlap' test result can be read. For more complex co-processor arrangements, the stepping through the list and stopping at an overlapping polygon or the end of the list is handled by the FPGA rather than the master.
Of the many design automation tasks, design rule checking and simulation are considered here as their algorithms are generally simpler and less specialized than those of the other tasks. Both applications are
Logic Timing Simulation
The example considered here is an event-driven compiled-code logic simulator [10] which is ex-tended to include timing [11] . To summarise, this technique:-1. only calculates the outputs of blocks which have inputs changing (events) at the current simulation time, rather than updating every subcircuit in the design. If the outputs change, new events will be generated which will propagate the change to all fan-out blocks. These will be updated at a subsequent simulation time; 2. maintains many lists of output events--one for each time step. Output events are appended to event lists depending on the block and net propagation delays; 3. avoids the significant time overhead of retrieving data from complex data structures by compiling code before running the simulation. The data is contained within the compiled code that is executed during simulation.
The examples assume a rich set of primitive functions, with multiple logic levels and driving strengths and different rising and falling transition times. Execution times considered are for updating a 2-input gate and a full adder, generating a new event if required and inserting it into the correct event list. A block diagram for a DMA co-processor is shown in Figure 1 . Other co-processor arrangements are modifications of this. 
CONCLUSIONS
This paper has discussed the suitability of using FPGAs for accelerating design automation tasks in order achieve moderate speedups throughout the design process in a cost-effective and flexible manner. This is achieved by reconfiguring FPGAS to be used as coprocessors that have been optimized for the current task in hand.
Various 
