We present a new concept and its circuit implementation for a high-speed and low-voltage associative co-processor with Hamming distance ordering. A hierarchical search architecture keeps high speed in large input number. Our circuit implementation allows unlimited data base capacity and achieves low-voltage operation under 1.OV for SoC applications, which are difficult for the conventional analog approaches. The search logic embedded in a memory cell realizes word-parallel Hamming distance ordering for high-speed sorting/routing applications as well as nearlnearest-match detection for recognition. Our fabricated 0.18 pm @-bit 32-word associative coProcessor operates 41 1.5 MHz'and 40.0 MHz at 1.8v and 0.75V respectively. tion. A 64-bit 32-word associative co-processor has been fabricated in 0.18 pm CMOS process and successfully tested.
Introduction
Some applications, such as data compression, pattem recognition, multimedia and intelligent processing system, require a huge amount of memory access and data processing time. To reduce them, a lot of context addressable memories (CAMs) are developed [1]- [3] . These CAMs can quickly detect a completely matched data in a data base. In recent years, advanced applications require to detect not only a completely matched data hut also a near-match data. The CAMS using analog circuit technologies have been proposed for quick nearest-match detection [4] -[XI. Their circuit implementations are compact in general, however, difficult to operate in deep sub-micron (DSM) process and low voltage supply. Therefore they are not suitable for a system-on-chip VLSI in DSM process.
In this paper, we present a high-speed and low-voltage associative co-processor using a hierarchical search architecture with the capability of word-parallel Hamming distance ordering. It has three advantages: ( I ) The first advantage is highspeed search in large data base due to a hierarchical search ar- The search signal updates the permission signals and the next mismatched bit becomes maskable. Thus the data of HD = 2 is detected at the clock period 2. In this architecture, the critical path is the search signal propagation path of one block and the hierarchical bypass line. The search time has similar characteristics of a carry-bypass adder, so that it is applicable to a large data base.
Circuit Configuration is masked and the SS restarts at the cell where both the search signal (SS) and the permission signal (PS) are true. Therefore only one mismatched bit is masked in word parallel and all data can be detected in order of Hamming distance. Fig.3 (a) shows static circuit implementation. It realizes a high tolerance for device fluctuation and a low-voltage operation. Fig.3 (h) shows compact circuit implementation using dynamic circuits. It saves a search circuit area far large capacity. Fig.4 (a) shows a detected data selector, which masks one output of the detected data in the same clock period after its address encoding. All detected data in the same HD can be encoded by the next priority encoder stage. Fig.4 (b) shows a binary-tree priority encoder. It realizes a small area and quick address encoding with O(1og M ) delay time at M-word capacity. We have designed and fabricated a 64-bit 32-word associative CO-processor using the present architecture and the static circuit implementation in 0.18 pm CMOS process'. Fig.5 illustrates a block diagram of the fabricated memory module and Fig.6 shows its chip microphotograph and components. The associative co-processor has 64 x 32 memory cells with the search circuit, a memory readJwrite circuit with data buffers, a word decoder, and a 32-input priority encoder with a detected data selector. We have also designed a 64-bit 2-word associative memory using the compact implementation far performance evaluation on the same chip.
Measurement Results and Discussions

A . Area and Capacity
The designed 64-bit 32-word associative co-processor occupies 475 pm x I160 pm (0.55 mm'). The area of a memory macro cell with a static search circuit is 9.6 pm x 13.6 pm (130.56 pm') as shown in Fig.7 (a) . In the static circuit implementation using 0.18 pm process, the cell area is x6 and Fig.7 (b) shows a layout of the compact implementation using dynamic circuits. It occupies 7.2 pm x 8.8pm (63.36 p d ) . In this case, the cell area is x3 and x2 as large as a 6 1 SRAM cell and a standard complete-match CAM cell. The number of transistors in our memory cell is larger than the conventional analog approaches [4] - [8] . The analog approaches are, however, difficult to follow device scaling especially in DSM process with keeping its performance and marginal capacity. Our approach can follow device scaling and operate in low supply voltage because of synchronous digital search logics embedded in memories. Besides, it has no limitation of capacity and search distance. Therefore our associative co-processor has more potential for practical use and large capacity than the conventional designs. Fig.8 shows measured waveforms using an electron beam probe at room temperature. It shows a delay time of the critical path from the search circuit clock (CLK) to a search output (SOi). The delay time for Hamming distance search in 64-bit data length is 2.18 ns in the worst case. The operation speed of the fabricated associative CO-processor is 41 1.5 MHz and 40.0 MHz at 1.8V and 0.75V power supply respectively. Fig.9 shows measurement results of the operation speed in 0.75V-to-1.8V power supply. In the Hamming distance ordering, the search time needs clock counts comesponding to its Hamming distance. It takes 65 clock periods to order all data from 0-bit distance to 64-bit distance. Our fabricated associative co-processor completes the Hamming distance ordering for sortingirouting of all data in 158.0 ns. It's difficult to implement such a function in high speed by the conventional analog techniques. When the target application requires only 
B. Operation Speed
28-5-3
C. Power Dissipofion
The power dissipation ofthe associative co-processor is < 51.3 mW at 1.8V power supply and 400 MHz operation. In lowpower operation, it is 1.18 mW at 0.75V power supply and 40 MHz operation. The search accuracy of the conventional analog approach is unstable and sometimes senseless in lowpower operation. Our search operations are precise regardless of a power supply voltage. The specifications of the fabricated co-processor are summarized in 
Conclusions
We proposed a new concept and its circuit implementation for a high-speed and low-voltage associative co-processor in DSM process to solve the problems of the conventional analog techniques. It achieves no limitation of data capacity and keeps high speed in large data base due to a hierarchical search architecture and a synchronous search logic embedded in a memory cell. Our extended functions, such as Hamming distance ordering, are effectively applied to high-speed sortinglrouting applications as well as nearlnearest-matching applications. We have designed and fabricated a 64-bit 32-word associative coprocessor in 0.18 p n CMOS process and shown a high-speed and low-voltage operation. The operation speed achieves 4 11.5 MHz and 40.0 MHz at 1.8 V and 0.75 V supply voltage respectively.
