Programmable logic device with an 8-stage cascade of 64K-bit asynchronous SRAMs by Nakamura  Kazuyuki et al.
Programmable logic device with an 8-stage
cascade of 64K-bit asynchronous SRAMs
著者 Nakamura  Kazuyuki, Sasao  Tsutomu, Matsuura 
Munehiro, Tanaka  Katsumasa, Yoshizumi 
Kenichi, Qin  Hui, Iguchi  Yukihiro
journal or
publication title
Proceedings of IEEE Symposium on Low-Power and
High-Speed Chips (Cool Chips VIII)
year 2005-04-22
URL http://hdl.handle.net/10228/00007542
Programmable Logic Device  
with an 8-stage cascade of 64K-bit Asynchronous SRAMs 
Kazuyuki NAKAMURA, Tsutomu SASAO, Munehiro MATSUURA, Katsumasa TANAKA,  
Kenichi YOSHIZUMI,  Hui QIN, and *Yukihiro IGUCHI 
Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka, 820-8502 JAPAN 
*Meiji University, 1-1-1 Higashimita, Kawasaki, Kanagawa, 214-8571 JAPAN  
The first implementation of a new programmable logic 
device using LUT(Look-Up Table) cascade architecture is 
developed in 0.35um CMOS logic process. Eight 64Kb 
asynchronous SRAMs are simply connected to form an LUT 
cascade with a few additional circuits. Benchmark results 
show that it has a competitive performance to FPGAs. 
1. Introduction 
RAMs and PLAs (Programmable Logic Array) are used for 
programmable logic devices that realize multiple-output 
combinational logic functions. However, when the number of 
inputs and/or outputs for the target function is large, these 
devices require excessive amounts of hardware. Thus, FPGAs 
(Field Programmable Gate Arrays) are often used. However, 
the area and delay for interconnections among logic cells are 
much larger than those for logic elements. Therefore the 
prediction of the performance of the FPGA is difficult 
without complete physical design. To solve these problems, 
an LUT cascade architecture that is composed of a serial 
connection of large-scale memories has been developed [1]. 
The LUT cascade uses relatively larger LUTs (1kb - 1Mb), 
and the interconnections between LUTs are limited to the 
adjacent cells in the cascade. This is quite different from the 
two-dimensional structure of FPGAs with smaller (16b - 64b) 
LUTs. The large area for the interconnections in an FPGA is 
absorbed in the larger LUTs in the cascade. So, the cascade is 
re-configured by only changing the contents of the LUTs. 
2.  Design of LUT Cascade LSI 
An LUT block is mainly composed of an asynchronous 
64kbit SRAM with 13bit address inputs and 8bit data I/Os. 
Since the memory should be operated as a data-path in LUT 
cascade architecture, an asynchronous SRAM is employed. In 
our design, each LUT block has 17 inputs: 8 bits are 
connected to the outputs of the preceding LUT, and 9 bits are 
from the external inputs (X). Then 13 of the 17 inputs are 
selected by crossover switches to form the actual address 
inputs of an LUT, and the unselected 4 signals can be used as 
intermediate outputs through Y terminals. This configuration 
enables to extend the number of allowable input/output 
signals from 48/8 (without crossover switches) to 76/36 (with 
crossover switches) for an 8-LUT cascade. This achieves an 
increase in the number of functions realizable with the LUT 
cascade.  
The LUT cascade LSI is simply realized by a cascade 
connection of LUT blocks. Each LUT block also has 
connections to the common address lines used for 
programming and testing. When a TEST (Program) signal is 
High, an LUT is selected by the block select signals (BS) and 
all address inputs of the LUT can directly be controlled by the 
external address inputs (X and ADDL). In this mode, this 
chip is compatible with a conventional memory. 
In order to realize a variety of functions and high-frequency 
operation, our design supports both parallel and pipeline 
operations. The cascade can be split into two to operate 4 
upper and 4 lower LUTs in parallel. In this case, we can use 
this device as a dual 4-LUT cascade. Since asynchronous 
SRAM is used as memory, by only providing address 
registers, an eight-stage pipeline operation can be realized, 
where each LUT corresponds to one stage of pipeline.  
3. Measurement Result 
The LUT cascade LSI was developed by 0.35um CMOS 
logic process. Although it looks like a conventional large-
scale memory, the layout is point-symmetry as opposed to 
conventional line-symmetry. We verified that conversion of a 
simple memory into the LUT cascade requires small 
additional circuits, where the area overhead is +0.4% and the 
transistor count overhead is +0.2%. The latency of an internal 
LUT is 3.8ns. A total latency of 11.6ns for a 2-LUT cascade, 
34.4ns for an 8-LUT cascade in asynchronous operation, and 
the operating frequency of 200MHz in an 8-stage pipeline 
operation were experimentally confirmed.  
4.  Performance Comparison with FPGA 
To compare the performance (area, delay, power) of LUT 
cascades with FPGAs, we mapped simple benchmark 
functions to the LUT cascade and a commercial FPGA 
(Xilinx XCV50: 0.22um, 5-Layer metal, 2.5V, 384 CLBs) 
[2][3]. We used commercial logic synthesis and layout tools 
for the design of FPGA. On the other hand, for the design of 
LUT cascade, we used our newly developed logic synthesis 
tool that converts BDDs (Binary Decision Diagrams) into 
LUT cascades [1]. By taking the difference of process 
technology into account, we can conclude that, by using the 
same process technology as the FPGA for the LUT cascade, 
we can achieve a comparable layout area with less delay time 
and less power dissipation. 
5. Conclusion 
We first implemented the LUT cascade in CMOS process, 
and experimentally confirmed its competitive performance to 
FPGAs. The appearance of the chip is quite similar to the 
conventional memory. The design cost is much lower than 
FPGAs, and we can apply the test and redundancy 
methodologies for memories. The LUT cascade LSI is a new 
and promising reconfigurable logic device for future sub-
100nm LSIs. 
Acknowledgement 
This research was supported by funds from the Japanese Ministry of 
MEXT via Kitakyushu innovative cluster project, the Japan Society for the 
Promotion of science (JSPS), and the Takeda Foundation. The chip has been 
fabricated through VLSI Design and Education Center (VDEC), the 
University of Tokyo in collaboration  with Rohm Corporation and Toppan 
Printing Corporation. 
References 
[1] Y. Iguchi, T. Sasao, and M. Matsuura, "Realization of multiple-output 
functions by reconfigurable cascades", International Conference on Computer 
Design (ICCD2001), Sep.2001,pp388-393.  
[2] MCNC-benchmark functions: http: //www.cbl.ncsu.edu /www/  
[3] http://www.xilinx.com/ 
 
