# FPGA for High-Performance Bit-Serial Pipeline Datapath

Tsuyoshi Isshiki, Takenobu Shimizugashira, Akihisa Ohta, Imanuddin Amril and Hiroaki Kunieda

Tokyo Institute of Technology, Electrical and Electronics Engineering 2-12-1 Ookayama, Meguroku, Tokyo 152, Japan

Abstract — In this paper, we introduce our work on the chip design of a new FPGA chip for highperformance bit-serial pipeline datapath which is customized both in the logic architecture and routing architecture. The chip consists of 200k transistors on 3.5mm square substrate (excluding the IO pad area) using  $0.5\mu$  2-metal process technology. The estimated clock frequency is 156MHz.

#### I. INTRODUCTION

Large-scale configurable systems composed of a number of FPGA devices have been demonstrated to have the capability of high-performance computing in various areas such as signal and image processing, data base search, pattern recognition, encryption, and embedded real-time systems. FPGA technology, which itself has evolved significantly over the past few years increasing its gate capacity well over 100k, possesses a fundamental problem in which the overhead area for implementing programmable interconnect networks between logic blocks quickly begins to dominate the chip when the logic capacity increases.

In our previous work, we have developed a highperformance bit-serial pipeline datapath synthesis system for large-scale configurable systems. We are able to emperically guarantee a near 100% logic utilization with 100% routability, consistently high speed clock operation ( $\sim$ 40MHz on Xilinx 3100A FPGAs), without any low-level manual intervention. The most significant reason for our system's capability is the high routability of our bit-serial circuits which is strongly supported by the observation of the Rent exponents of these bit-serial circuits[2].

Based on the above motivation, we have designed our own FPGA architecture, emphasizing on both logic architecture and routing architecture.

# II. BIT-SERIAL FPGA ARCHITECTURE

Main features of our bit-serial FPGA architecture are summarized below (Fig.1):

**Logic architecture** One of the distinctive characteristics of bit-serial circuits is that while the connectivity of

bit-serial cells, such as adders and multiplier cells, is sparse, the connectivity inside the cell is dense. Our strategy here was to increase the logic capacity of the logic block and absorb the dense interconnection inside the logic block to reduce the inter-block routing resource. One logic block has four 4-input LUTs, 6 flipflops and multiplexers for carry-save operations and 5-input functions.

**Routing architecture** The routing architecture is divided into two-levels : *external block routing* and *internal block routing*. The purpose here is to give maximum flexibility for the routing between logic block pins and to provide a rich local routing resource which is needed in bit-serial circuits. External block routing is implemented with S-blocks which connect single-length routing segments and C-blocks which connect logic block to the routing segments.

There are several advantages of our two-level routing architecture. First, the intermediate routing resource inside the logic block (hence the *two-level routing*) enabled us to insert buffers at the logic block pins which effectively isolates the capacitive load of the drain capacitance of pass-transistors from the routing segments. The routing delay through the single-length routing segment is greatly reduced. This fact also leads to power reduction. Second, all logic block outputs can be routed back to itself without consuming any external routing resource. Also, connections between adjacent logic blocks which frequently occurs in bit-serial circuits is implemented via C-blocks without consuming any external routing resource as well. Third, connectivity of inputs and outputs are made symmetric with a high degree of flexibility in four directions. This increases the routability and also makes the automatic routing very easy to develop.

#### III. VLSI IMPLEMENTATION

The bit-serial FPGA architecture was first transformed into a transistor-level description on Verilog (1 months  $\times$ 2 persons), and mask layout was done on full-custom (4 months  $\times$  2 persons). The process technology used in this design was  $0.5\mu$  (gate length =  $0.6\mu$ ) 2-metal process. Due



Fig. 1. Logic block architecture, internal and external routing architecture of bit-serial FPGA.



Fig. 2. Layout of bit-serial FPGA chip.

to limited manpower and time, we could not spend enough time on the floorplanning and transistor size optimization, and therefore the layout result leaves room for further improvement on area and speed. One logic block, one S-block and two C-block consumed  $385\mu \times 407\mu$  area. With further rework on the layout, we would expect the area to shrink within  $350\mu \times 350\mu$ . Inside the  $3.5 \times 3.5mm$  chip area (excluding IO area), there are  $8 \times 8$  logic blocks and  $16 \times 4$  IO blocks. Various data are shown in Table I.

| TABLE I                                           |                         |
|---------------------------------------------------|-------------------------|
| ESTIMATED PERFORMANCE OF OUR BIT-SERIAL FPGA CHIF |                         |
| area (LB, SB, $CB \times 2$ )                     | $385 \times 407 \mu^2$  |
| area (total)                                      | $3,500	imes3,500\mu^2$  |
| transistor count                                  | 200k transistors        |
| max. gate/block                                   | $\sim 70 \text{ gates}$ |
| max. gate/chip                                    | $\sim 4500$ gates       |
| clock frequency                                   | $156 \mathrm{~MHz}$     |
| (assume 4 manhattan distance routing)             |                         |
| 16-bit multiplication $(\times 2)$                | 19.5 MOPS               |
| 8-bit multiplication $(\times 4)$                 | 78 MOPS                 |
| 16-bit addition $(\times 64)$                     | $624.64 \mathrm{MOPS}$  |
| 8-bit addition $(\times 64)$                      | $1.25  \mathrm{GOPS}$   |

## Acknowledgement

Authors would like to thank the members of CAD21 Research Body of Tokyo Institute of Technology and members of Kunieda Laboratory their suggestion and cooperations.

## References

- Tsuyoshi Isshiki and Wayne Wei-Ming Dai, "Bit-Serial Pipeline Synthesis for Multi-FPGA Systems with C++ Design Capture," Proc. IEEE Symp. FPGAs for Custom Computing Machines, April 1996. T. Isshiki and W. W. -M. Dai,
- [2] T. Isshiki, W. W. -M. Dai, Hiroaki Kunieda "Routability Analysis of Bit-Serial Pipeline Datapaths," *IEICE Trans. Fundamentals*, pp.1861-1870, Oct. 1997.
- [3] Wilm E. Donath, "Placement and Average Interconnection Lengths of Computer Logic," *IEEE Trans. Circuits and Sys*tems, pp.272-277, April 1979.