A test vehicle for a wafer scale FPGA has been developed. A symmetrical RAMprogrammable FPGA, lookup table based logic block and segmented channel routing are used. In this paper, we emphasize on the practical problems inherent to wafer scale FPGAs: redundancy, power shorts, clock distribution, cell and bus testing. The laser-link process is used to interconnect working cells and form a defect free array of FPGA cells. The defect avoidance algorithm is designed to minimize the delay between working cells, an important parameter for FPGA users.
Introduction
Since its introduction in 1985, the Field Programmable Gate Array, or FPGA, has grown steadily in use and is now widely employed by designers, especially as a cheap and fast mean to implement new designs. The largest FPGAs have an equivalent gate count of approximately 20,000 gates. However, with the large amount of routing involved in an FPGA design, usually around 70%-90% [l] , it is difficult to increase the cell count and therefore the design complexity. Arrays of FPGA chips on a board are used as a prototype platform [2] , yet the delay between the chips remains large compared to the delay within the chip.
The idea of a wafer scale FPGA has already been proposed by others [3] . However, we propose a different approach where the defect avoidance is invisible to the user. The focus of the paper is on solving the interconnection and defect avoidance aspects of wafer scale systems. The FPGA cells employed are simple structures which would be replaced by more complex cells in a full system. Reasonable estimates indicate that in a final system, with a 0 . 8~m CMOS technology, it would be possible to implement an FPGA of approximately 1.5 million equivalent gates on a six inch wafer, and close to 3 million on an eight inch wafer, given a yield of 75% for the cells.
With such a large system, we believe that the general nature of an FPGA makes it an excellent candidate for wafer scale. Its high flexibility as a prototype emulator gives this a potential market much larger than other wafer scale systems which are very specialized.
Design
The design is divided in two parts: the first one, Logic Design, gives an overview of the different parts of an FPGA. The second one, Physical Wafer Layout, deals with the redundancy and the defect avoidance necessary to make a wafer scale system.
Logic Design
The main building block is the FPGA cell. This cell includes the logic block, the routing channels, the connection box, which connects the logic block to the lines in the channel via programmable switches, the switch box, to direct the signals from a cell to another, and a memory to store the state of the cell. The target system is a symmetrical array of identical cells programmable with static ram bits. For the test vehicle, a simple cell was employed. This cell contains all the basic features of the more complex FPGA cells expected in a wafer scale design.
The logic block (Figure 1 ) is composed of a three input look-up table (LUT) and one D flip-flop. This configuration is used because it is simple and both the combinational and sequential circuit can be tested. The number of input/output is low to allow for a smaller channel density and ease the design process. The routing channels will run both horizontally and vertically ( Figure 2 ). This is the same architecture used in Xilinx FPGAs [I] . Figure 3 shows the YO connections and switch matrix. The inputs/outputs will be accessible on one side of the cell. The number of tracks in each channel is twelve, including a redundant clock line and four double length lines. The double length lines only connect to every second switch box, resulting in faster long connections. The connection box allows each inputloutput to be connected to six possible lines in the channel. The switch box allows each channel to be routed to three different channels, one in each direction. The double length channels intersect in the cell to allovv the same cell to be used everywhere [4] .
The state of each cell is given by a series of static ram bits which forms a shift register. The bit pattern is fetch serially to each column. The VO blocks are multiplexer based and allow the mapper to choose one of the ten lines in the channel to be accessible via a pad. 
Physical Wafer Layout
The defect avoidance and redundancy techniques for substitution of spare blocks are the key in making a working wafer scale FPGA system. Several papers have dealt with the redundancy of different arrays [5] . For an FPGA, the final array must be logically symmetrical, so the FPGA programmed design can be submitted to the same mapper, although the physical defect avoidance is different. Since the delay between two adjacent cells is very important, we must use a structure allowing the replacement of a defective cell by a neighboring one. For these reasons, we use a variation of the Gupta algorithm [5] for a multi-pipeline array. The main idea behind this algorithm is to start in the upper left corner of the array and connect each cell in a row. When there is a defective cell, we replace it by the next working one in the same column. A defect avoidance bus running between the columns enables us to shift the columns up or down (Figure 4) . By using the laser links and cuts [6], we can connect a cell to the closest working cell in the adjacent column, with the use of a laser switch box. To keep the design smaller, the defect avoidance bus width allows only one channel to be redirected. We will explain in the defect avoidance section what are the consequences of this choice.
Vertical
Defect Avoidance Channel Bus 4 4
Figure 4. Physical Design
The switch box itself has to be restructured in case of a defect in the cell. Each vertical and horizontal segment of the channel can be laser-linked so the cell is transparent to the signal path. The switch box is then no longer programmable and forms a set of longer tracks going across the cell. This offers also the possibility of bypassing an entire row or column, by making invisible all the cells in a row or column. An interesting problem occurs with the segmented channels: for the mapper, those segments must reach the second adjacent cell. However, this cell may be farther in the reconstructed array. To solve the problem, we use the laser links and cuts to remove the intersection in the double length segments so they can be bypassed like any other channel.
For cells that pass the power short test but are still defective, the clock of each D flipflop will have a laser link to allow the line to be connected to the ground. This is done to avoid any power consumption by defective cells.
Since the programming bits are run serially from a cell to another, we must prevent a defect in one cell to propagate a wrong bit pattern throughout the column. The shift register in each cell can be bypassed with a laser link, so the bit pattern i s not affected by a defective shift register.
Power routing
Previous experimentation has shown that many of the bad cells in a wafer scale design are likely to have a power short. If this problem is not taken into account, a power short could kill an entire wafer, even before tests can be performed. The best procedure is to have all blocks initially disconnected from the power bus and then add them individually, checking for shorts. This has been previously done using laser links connections and large power transistors. Laser link power connections have low resistance and are permanent, but are best used where testing has established the working cells [6] . Transistors are more flexible, allowing devices to be added dynamically to the power bus [7] , but have higher resistance than a laser link. A compromise is a new kind of laser link called the testable power link ( Figure 5 ) . This laser link has a gate between its doped regions for a short portion of the link. This gate forms a transistor on one end of the laser link. So by activating this transistor, we can simulate the state of the link before zapping it. By placing such a link between the main ground and the ground of each cell, we can test the power consumption separately and connect only the cells presenting no problem (in N type the substrate is powered so it is the ground which must form the connection). The small transistor has a much higher resistance than the laser link and therefore will be used only for the initial testing of the cell to check for shorts. The graph in Figure 6 shows the voltage drop across the transistor and the laser link for different values of current. Note that the laser link behaves like a l O O R resistor. The results of the Hspice simulation and the values measured with a test device in Mite1 1 . 5~ technology are very similar, proving that there is no problem in combining a transistor and a laser link to form a testable link.
Testing

Power
As mentioned earlier, each cell has to be tested individually for possible power shorts. This is done by activating each testable link with a rowkolumn access circuitry and verifying the power consumption of the block. To reduce the pinout, the addresses are generated on the chip. After each working cell has been identified, we can laser link the power to those cells and continue functional testing.
Buses, Cells and Shift Registers testing
After the power, we test the bus signals for possible shorts and open circuits. The buses run across the entire wafer, so the connectivity can easily be tested. Extra lines are provided in case of a defect in the bus line.
Following this, the shift registers must be tested. We fetch a random pattern in the shift register and check if the pattern exiting each cell is valid. Once again, we use a rowkolumn access circuitry to select the cell to monitor. If a shift register in a cell fails the test, we bypass this shift register and the cell is marked as bad.
Finally, we are ready to test the logic operation of each cell. We input same vectors to each cell and those which give different results than the majority are faulty.
Defect Avoidance
After the testing, we have a final map of the defective cells and lines. We can put this map through an assignment and routing algorithm and make the necessary laser link connections. The algorithm will optimize the connections to avoid long delays between cells.
Since there is only one bus between each column, we must use pseudo-faults [5] where we avoid a good cell to reduce the wiring.
In addition to power problems, there are two major areas where the defects can occur in a cell: in the logic, or in the channels. The logic is the most vulnerable part. We can simply avoid the cell and use the channels to reroute the signals (Figure 7) . If a defect occurs in the routing channels themselves, we cannot use the cell to bypass the signal. Then we have to bypass an entire row or column, depending which one of the horizontal or vertical channel is defective. The same goes if a restructuring bus is inoperative. Bypassing an entire row or column also enables us to avoid the rows or columns with the most defects to optimize the delay and ease the defect avoidance. So, in addition to a number of extra rows, we must also provide a set of extra columns. The number of extra rows and columns is under investigation. It is a function of the defect density and a function of the clustering. We have written a program which gives a random defect map with clustering. This program generates defects according to the model proposed by A. Tyagi and M.A. Bayoumi [SI. The next version will also include different probabilities for more robust parts of the design, like the buses. This is important in an FPGA because of the large area of the interconnections. This model will enable us to adjust the number of extra cells to achieve a certain yield, given the defect density and the clustering coefficient. 
Delay approximation
The major drawback of an FPGA is circuit speed. For a given circuit, the custom implementation will be much1 faster than the FPGA because of the large delays in the routing circuitry. Some papers have dealt with the optimization of the logic block complexity, the cell granularity and the different architectures to optimize the speed of FPGAs [9] [10]. In this section, we compare the results of defect free wafer scale FPGAs of the same complexity, but without the redundancy or defect avoidance, with the architecture we used, taking into account the defect avoidance overhead. We note the defect free WSI FPGA would have a negligible yield and we consider it only as the idealized comparison target, with a speed similar to current standard FPGAs.
In [9] , the total delay (Dtot) of the critical path in a defect free FPGA is approximated as follows:
Where N, is the number of logic blocks in the critical path, DLB the delay of the Logic Block and D, the delay of the routing between two blocks. The delay of the logic block can be easily calculated or measured. The delay of the routing is much more difficult to approximate, It depends on a large number of factors, like the fanout and the length of the connection [9] . We will use a calculated value to give an idea of the delay but the reader should keep in mind that this value can vary a lot, even just by remapping the circuit. I will also use certain values given in [9] for the delays of circuits mapped on defect free FPGAs.
For the wafer scale circuit, we must include new delays in the calculation of the total delay Dtot because of the extra routing and the physical reconfiguration of the array. There are two extra delays: DOH, the delay of the overhead reconfiguration circuitry in each cell and Dmc, the delay of a reconfiguration channel. For a yield of Y, we can write the total delay as follows:
Where NR is the number of reconfiguration channels used for each defective cell. This number is hard to evaluate because it depends on the reconfiguration algorithm. The simplest case is where NR=2, which is the case if the signal propagates in a line of logic blocks and one connection down and one connection up is required for each defective cell. Because NR can be more than 2, we will give results for different values of NR. Table 1 gives the delays we calculated on the design using HSpice. As an example, we can calculate the delay for a worst path of ten cells. The results for three different values of Nr are given in Figure 8 . The single point represent the delay of a defect free FPGA.
I Delay I ns I The graph shows that for high yield, the difference between the wafer scale FPGA and the defect free FPGA of the same complexity is small. To achieve good performance, the reconfiguration algorithm must give a value of Nr as low as possible, especially if the yield is low.
In this example, we measured the delay for a logic block composed of a 3 input look-up table. Larger blocks show a larger delay but require less routing. A very large block is useless in standard VLSI because not enough blocks can be placed on a single chip. But for wafer scale, the choice of a larger block may improve the performance.
The WSFPGA has a lower delay than the prototype boards composed of arrays of commercial FPGAs because of the high impedance of the board connection and the extra delay caused by the routing chips [ l l ] .
Experimental work
The described cell has been submitted for fabrication (Figure 9) . The chip will be used to try the power test on the cells, and also the laser linking of the segments to avoid a defective cell. The test chip is a long array (Figure 10 ) and includes a row of 12 identical cells plus a restructuration bus. This chip will be used to measure the delay of long connections and evaluate the delay of a worst path up to 12 cells. Another chip containing a 2x5 array of a simpler cell has also been submitted. 
Conclusion
The design of a test vehicle for a wafer-scale FPGA has been presented. The main purpose of the test vehicle is to verify the reconfiguration and testing techniques needed to create a complete wafer scale design. The delay approximation indicates that the speed of a Wafer Scale FPGA is close to a standard FPGA and better than a PC board of FPGAs. One of the application of a wafer scale FPGA would be a platform for testing very large designs. The great flexibility of an FPGA could also be used to implement self healing circuits, with testing and repairing circuitry on the same FPGA. The fabrication of a real wafer scale project will be possible after all the problems have been solved with the test vehicles.
Further work includes the addition of embedded memory and different types of logic blocks on the same chip, to get more speed and logic density for more complex designs.
