I. INTRODUCTION

A
CONTENT addressable memory (CAM) allows access to memory on the basis of data stored rather than a physical address location. A CAM performs a parallel comparison of stored data with an input argument. These features have earned CAM's widespread usage which include: translation look-aside buffers for virtual memory systems, tag directories in fully associative cache organizations [1] , collision detection VLSI processor for intelligent vehicles [2] , interconnection network router [3] , [4] , database accelerator [5] , self-testing reconfigurable CAM [6] , and applications in artificial intelligence and image processing. Additional applications include logic inference, classifiers [7] , [8] , pattern matching [9] , sorting [10] , and applications that require searches in specific address ranges [11] . A CAM must be able to perform the functions: write, read, and match. The write function provides a means of storing data into the memory, while the read function enables data retrieval for refresh purposes. In a match function, a CAM compares TABLE I  ROUTING TABLE USING TERNARY CAM the data provided (argument) to it with the data stored in it and indicates a match if they are the same.
The focus in the design of the dynamic circuit is on high performance and compact size. In the dynamic CAMs (DCAMs) that are presented here, precharged match lines are used in implementing the match function. The match function involves the comparison of data presented to CAM cells with the stored data in them, and the result of match search is determined by the state of the match lines. A CAM is made up of a basic CAM cell capable of storing a trit, i.e., a logic zero (0), a logic one (1), or a don't care (X). A major advantage of a ternary CAM cell over a binary CAM cell is derived from its ability to store and compare with a don't care (X). A don't care state can be used to mask off some bits in a memory location and also during match function. Table I shows a ternary CAM application for routing tables [3] . In this application, a hypercube tree routing algorithm is encoded in the CAM using ternary values. The destination address is matched with the CAM entries; the routing table is for a node at level and bits labeled as are obtained from the current node address. It should be pointed out that don't care bits (X) appear at any location in the CAM.
The performance of a CAM cell can be studied from the delays involved in match and read functions. In a match operation, if a match occurs in a location the precharged match line of that location is not discharged, and if a nonmatching location occurs the corresponding match line is discharged. The worst delay in discharging a match line is when only a single cell in a word has to discharge the line. For higher performance, the design of CAM cell must ensure fast discharge of match line under the worst case condition. Reading from the CAM must occur fast enough to enable a CAM-based system to operate at a high clock rate, necessary for most applications in computer systems. Due to current trends toward hand-held devices, another design issue is current consumption for both match and read operations. The organization of this paper is as follows. In Section II, new decoupled dynamic ternary CAM cells are described. Section III presents the performance results derived from simulations performed on these cells and a comparison with existing CAM cells. Two approaches to improve a system performance are presented in Section IV. Some concluding remarks are provided in Section V.
II. DECOUPLED CAM CELLS
In this section, five novel decoupled CAM cells are introduced. The cells have been categorized by the number of transistors each has. The CAM cells have six, six-and-a-half, sevenand-a-half, and ten-and-a-half transistors. The last three CAM cells have a transistor that is shared by two adjacent cells in a location. The match lines in these cells are decoupled from the cell transistors resulting in shorter matching delays. These cells are capable of storing a ternary digit (0, 1, or don't care ).
A. 6-T DDCAM Cell
The six transistor (6-T) decoupled DCAM (DDCAM) cell shown in Fig. 1 consists of exclusively nMOS transistors. This cell has two transistors and arranged in an exclusive-OR configuration. These transistors and their gates serve as dynamic data storage elements. These gates are labeled as Sb1 and Sb0, and are accessed through and . When transistors and are conducting the data on bit lines (BIT and NBIT) are transferred to the gates Sb1 and Sb0. Turning off the transistors and isolates the dynamic storage element from the bit lines. The read from the cell without altering the stored data is achieved by turning on the transistor . The gate of transistor is connected to the exclusive-OR output and is used during the match function. This puts the dynamic storage elements in the critical path. If transistor is off, the match line is not discharged indicating a matching condition. If is conducting the match line is discharged indicating a nonmatching condition. The DDCAM cell is designed to store the logic states 0, 1 and don't care (X), set by Sb1 and Sb0. There are three operations that can be performed on a DDCAM cell. These operations are write, read and match. Write is performed by controlling the transistors and with a logic 1 on the write line. When write line is set to logic 1, these transistors conduct transferring data from bit lines to the dynamic storage elements. Reading from the cell can be performed by setting the read line to logic 1, when either BIT or NBIT line is discharged if a logic 1 or a logic 0 is stored in the cell. The read operation requires that the bit lines be precharged to logic 1 just before reading. Precharging facilitates effective reading from the DDCAM cell. Setting the read line also ensures that the gate of transistor is set to logic 0, when zeros are stored at Sb1 and Sb0. Simultaneous read and write operations are used to refresh the cell contents.
The matching operation involves comparing the data presented on the bit lines with the stored data and evaluating the match lines. Logic 1 on match line indicates a match, while logic 0 means a nonmatching condition. The match operation also requires that the match line is precharged to logic 1 before comparing the stored data with the input. The output of exclusive-OR implemented by and is connected to gate of transistor . When there is a mismatch between the data stored and the input data, the output of exclusive-OR is set to logic 1. Thus, the transistor starts conducting, discharging the match line. If a match is found, the exclusive-OR output is logic 0, which turns off the transistor , preventing it from discharging the match line. If a don't care is stored in the DDCAM cell, any input presented on the bit lines results in a match condition. With zeros stored at Sb1 and Sb0, the transistors and are off preventing data from either the BIT or the NBIT line from passing to the gate of transistor . The gate of this transistor is set to 0 by the transistor to ensure that match line does not get discharged when a don't care bit is stored. Matching and nonmatching conditions and corresponding states of bit lines are summarized in Table III .
B. 6.5-T and 6-T(P) DDCAM Cells
The 6-T DDCAM cell requires that the bit lines BIT and NBIT be held at logic 0 while precharging the match line. Once precharged, the input data can be presented. This requirement introduces an unnecessary delay in the sequence of events. The 6.5-T DDCAM cell design allows these two events to occur simultaneously. The new DDCAM cell shown in Fig. 2 is very similar to the previous cell ( Fig. 1) , except for an additional transistor . This transistor is placed in such a way that it can be shared by two adjacent cells in a row, and it is thus considered half of a transistor. The method of performing operations on this cell is identical to that performed on 6-T cell. Transistor serves to evaluate the match line. This transistor isolates the match line from the transistor and the BIT and NBIT lines. Thus, the gate capacitance can be charged at the same time as precharging the match line before the evaluation begins.
When the evaluation signal comes, transistor is already set (either ON or OFF). This in turn results in faster match evaluation. The introduction of transistor increases the area of the design, but this factor is overshadowed by the performance improvement achieved and the ability to overlap operations. The dynamic elements are still in the critical path in this DDCAM cell too, which results in degradation of signal at the gate of transistor , should a match condition occur in the cell. The critical path can be improved by changing the dynamic elements and from nMOS to pMOS transistors as shown in the DDCAM cell 6-T(P) in Fig. 3 . Since the pMOS transistors pass good ones, the transistor is turned on stronger, and the match line discharge time improves. However, the cell has a possibility to fail during a matching condition. The voltage at Sb1 or Sb0 for logic 1 is equal to , and a true logic 1 on the bit lines will place the pMOS transistor in linear region of operation . In such a case, the pMOS is not completely turned off and could pass the logic 1 on bit lines onto the gate of transistor . In a matching condition this would compete with true logic 0 passed by the other pMOS transistor. This could partially turn on causing a failure of matching condition. The CAM's reliability can be increased by ensuring that the logic 1 on bit lines is approximately equal to the logic 1 voltage at Sb1 or Sb0. This could easily be achieved by precharging the bit lines through an nMOS pass transistor. Table IV shows how the three states 0, 1, don't care (X) are stored in 6-T(P) DDCAM cell. Table V shows the matching and nonmatching conditions along with the corresponding status of the bit lines for this CAM cell. 
C. 7.5-T DDCAM Cell
In the previous DDCAM cells 6-T and 6.5-T the transistor drives transistor , which in turn drives transistor . This results in the degradation of the output of the exclusive-OR (formed by transistors and ) during the transfer of a logic 1 (for a nonmatching condition). The maximum output value the exclusive-OR (that is connected to the gate of ) can attain during the transfer of logic 1 is where and are the threshold voltages of transistors and , respectively. Due to the body effect, the effective threshold voltages are larger than the nominal values; this in turn makes the exclusive-OR voltage smaller. The same situation is mirrored through transistors and when transferring logic 1 from NBIT line. This was improved by the 6-T(P) implementation, but the read time is worsened due to the presence of the pMOS transistors in the pull down path for read operation. The match line discharge time in 6.5-T can be improved, without affecting the read time by precharging the gate of transistor before the evaluation of the match function. In case of a nonmatching condition when the evaluation begins, the transistor can quickly discharge the match line since it is already turned on strongly. If a matching condition occurs, the bit line (either BIT or NBIT) would be able to discharge the gate of the transistor quickly, turning it off and preventing it from discharging the match line. Discharging the gate of is quick since the discharge is through nMOS transistor. This improvement in match line discharge time is achieved at the expense of an additional transistor. Implementation of this idea is the 7.5-T DDCAM cell as shown in Fig. 4 . The pMOS transistor is used for precharging the XOR output. The 7.5-T DDCAM cell's operation is identical to that of the 6.5-T DDCAM cell, except for the precharge of exclusive-OR output before match function evaluation. This precharge can overlap with the match line precharge.
D. 10.5-T DDCAM Cell
The 6-T(P) and 7.5-T DDCAM cells improve the matching operation but the read operation still suffers from a large delay. In the case of 6-T(P) the pMOS transistors pass poor zeros so the discharge of bit lines during a read operation is worse than that of 6-T and 6.5-T. In 7.5-T, the nMOS transistors, which can pass good zeros are weakly turned on again slowing down the read operation. The fifth decoupled DCAM cell, which appears in Fig. 5 , operates in a similar manner to the previous CAM cells, but differs from them in the way it stores data.
In this DDCAM cell, the dynamic circuitry has been removed from the critical path and the output of exclusive-OR can reach a maximum voltage of (where is the threshold voltage of either or ). The write, read, and match operations are accomplished the same way as in the previous cells, but the 10.5-T DDCAM cell inverts the stored binary information. If logic 0 has been stored at either Sb1 or Sb0, then transistors will be driven by a true logic 1 (equivalent to power-supply voltage ). This configuration results in shorter refresh cycles. Also, a refresh register is not required for this cell since the read operation places the proper signals on the BIT and NBIT lines.
The three states 0, 1, and don't care (X) along with their equivalent Sb1 and Sb0 values are stored at the input of the two inverters in the 10.5-T DDCAM. The resulting table for the stored values is similar to Table IV. Simultaneously storing zeros at Sb1 and Sb0 is not allowed since this would turn on both 
and
. The matching operation for this DDCAM cell is similar to that of the previous cells. Whenever the data presented matches the data stored, the transistor does not conduct and the match line is not discharged. If is conducting, it means the data on bit lines does not match the stored data and the match line is discharged.
III. EVALUATION OF DCAM CELLS
In this section, the performance of the DDCAM cells discussed is presented. Hanyu, et al. describe a one-transistor multiple-valued CAM, suitable for high-density associative memory arrays [2] . The one-transistor CAM requires steps to complete a comparison of an -bit word, while the DCAM cells presented in this paper require only a single step to compare an -bit word to the stored data. Due to this long matching delay the one-transistor CAM is not considered any further. Another DCAM cell presented in [5] , [12] is presented here and has been taken as a reference to compare the performance of the new cells.
A. Wade-Sodini DCAM
A cross coupled bit line DCAM cell with five transistors, for use in high density CAM arrays was designed by Wade and Sodini [5] , [12] . We refer to this cell as the Wade-Sodini (W-S) DCAM because of its inventors. This DCAM cell performs all the functions outlined for the proposed DDCAM cells. The W-S CAM cell has two transistors and arranged in an exclusive-OR configuration as shown in Fig. 6 . The gate capacitance of and are the dynamic storage elements and they are accessed through transistors and [5] . Setting the write line to logic 1, would turn on the transistors and and allowing the data from bit lines to pass onto the gates of and . A ternary digit is stored in a similar fashion to that of the decoupled DCAM cells.
Table VI provides a summary of the three states. The match line in this cell is coupled with a transistor and has two functions: to indicate the status of match operation and to read from the DCAM during the read operation. Reading from the cell is performed by charging the match line and discharging the bit lines. This results in the current flowing from the match line through the transistor and the transistor whose gate is connected to logic 1. Matching for this DCAM cell is slightly different from the DDCAM cells [except for 6-T(PMOS)] presented in the previous section. To match any state other than the don't care state, the bit lines must have data similar to the stored values at the gates of and . In case of a nonmatching condition the match line is discharged through transistor and whose gate is set to logic 1 [5] . Since the gate of is connected to the match line, it turns out that, as the match line discharges its resistance increases which eventually prevents it from discharging completely to 0 V. The match condition under different inputs and stored data is shown in Table VII .
B. Comparison of DCAM Cells
In this section, a comparison of the new DDCAMs and the W-S DCAM cell is presented. This comparison is with respect to the main circuit differences and the cell's performance in read and match operations. SpectreS simulations were performed using 0.25-m CMOS technology (TSMC process available at MOSIS-http://www.mosis.org/). All simulations have been performed using a 10 10 array.
1) Read Operation:
The decoupled DCAM cells presented in Section II perform a read operation through transistor . This transistor discharges BIT and NBIT lines during a read operation through transistors and . The W-S transistor relies on transistor and and to perform read operation. In this case charge is transferred from match line to the BIT or NBIT line. Since the charge transfer is through two nMOS transistors, the bit lines are not charged to . This operation will have a longer delay than the DDCAM cells since the voltages between gate and source of transistors and decrease as the BIT or NBIT line is charged. Fig. 7 shows the simulation results during a read operation. The 6-T, 6.5-T, and 7.5-T DDCAM cells have the same delay during a read operation, as they discharge identical capacitance through a very similar path. The 6-T (pMOS) has a long read delay due to the presence of pMOS transistor in the discharge path, which passes poor zeros. The 10.5-T DDCAM cell performs a fast read, since the nMOS transistors and are strongly turned on resulting in a fast discharge of the BIT or NBIT line. For comparison purposes, the read delay is considered to be the time it takes from the rising edge of the read pulse to the point when the bit line (BIT or NBIT) discharges to 0.8 V (an inverter connected to the bit lines starts switching from logic 0 to logic 1, when its input reaches 0.8 V). In W-S DCAM cell during the read operation neither the BIT nor the NBIT line is not able to reach (2.5 V) and so a voltage of 1.6 V is taken as the steady state value to calculate the read delay. The read times for 6-T, 6.5-T, 7.5-T DDCAM cells were calculated to be 579 ps, 577.4 ps, and 571.5 ps respectively, while 6-T(PMOS) has a delay of 1.38 ns. The 10.5-T DDCAM cell has a low read time of 162 ps, while the W-S DDCAM cell has a high read time of 2.74 ns. These results show that 6-T, 6.5-T, 7.5-T DDCAM cells can perform a read operation 4.76 times faster than W-S DCAM, while 10.5-T is 17 times faster.
2) Match Operation: Match operation is the most important operation for any CAM system and delays in this operation have a direct impact on the performance of the system. There are differences when nonmatching conditions are evaluated in the DDCAM cells presented in Section II and W-S DCAM cell. The 6-T, 6-T(P) and W-S cells require that the bit lines be discharged during the match line precharge cycle. This prevents the usage of bit lines for any other operation during this process. The 6-T and 6-T(P) cells experience delays in discharging the match line for a nonmatching condition due to the fact that the gate capacitance of transistor has to be charged before the match line can be discharged. Since the 6-T(P) cell has an improved discharge path due to the pMOS transistor (the pMOS passes good ones required at the gate of transistor .) this cell has shorter match line discharge time compared to 6-T DDCAM cell. The 6-T DDCAM cell takes 1.04 ns to discharge match line, while 6-T(P) takes 113 ps. The W-S DCAM has a delay of 581 ps in discharging the match line. The 6-T(P) DDCAM performs a match operation 5.14 times faster than the W-S DCAM. The 6.5-T, 7.5T, and 10.5-T DDCAM cells allow the comparison of the input and stored data to be done at the same time their match lines get precharged. This results in a much smaller delay. The measured values for the worst-case match line discharge times for 6.5-T, 7.5-T, and 10.5-T DDCAM cells are 374, 89.7, and 111 ps respectively. Comparison plots are shown in Fig. 8 . Table VIII provides a tabular representation of read and match time delays as well as properties of the CAM cells described in this paper. The last row indicates if the CAM cell requires N-well to place p-type transistors; this in turn has an impact on the size of the cell. The results show that the 10.5-T DDCAM cell is the fastest. A CAM cell which has match line discharge and read delay comparable to that of 10.5-T DDCAM cell is the basic CAM cell described in [13] . The basic CAM cell [13] has a match line discharge time of 105 ps, but the disadvantages of this cell are absence of an explicit read operation and it is a binary CAM cell.
C. Current Requirements of DCAM Cells
Power dissipation is another important performance metric used in comparing the DCAM cells presented in this paper. In the simulations performed the current drawn by a 10 10 array of cells was used as the basis for power comparison. The average current drawn by the cells during the match and read operations were calculated and these current values for the DDCAM cells and the W-S cell are shown on Table IX . These results indicate that in terms of required current for match operation 6-T, 6-T(P) and W-S have low current. 6-T and 6-T(P) cells do not have 
transistor
(evaluation transistor) which add capacitance to the match line when this is turned on. For read operation, 6.5-T cell requires the least current. The required current is similar to other cells such as 6-T and W-S.
Since match delay is considered as the major performance factor for a CAM, the match delay-current product has been plotted and shown in Fig. 9 . This shows that 6-T(P) performs better than the rest of the cells. This cell has the second lowest match delay (23.3 ps worse than 7.5-T cell) and the third lowest current. For the read operation, this cell has a longer delay. 
IV. APPROACHES FOR PERFORMANCE IMPROVEMENT
In this section, two ways to improve the performance of a CAM system are presented. These two approaches are higher write voltage and sense amplifiers. These two approaches help to improve the performance of a system based on the cells presented in this paper. The main purpose of this paper however is to present the DDCAM cells. These two approaches are considered as improvements that are external to the cells. Thus, higher write voltage and sense amplifiers are presented as potential ways to improve performance but the treatment of these topics is not complete.
A. Write Voltage
To improve match operation in most CAM cells a higher voltage on the write signal helps. The 6-T DDCAM cell is reprinted in Fig. 10 to help explain how voltage at write signal influences a match operation.
When the write signal is set at logic 1 (i.e., ) and the BIT (or NBIT) line is set high as well, the voltage at node (or ) is:
where is transistor threshold voltage. As mentioned in Section II-C, the stored voltage at affects the maximum voltage at the XOR node; this voltage is: (2) where is the threshold voltage of transistor . It should be pointed out that these threshold voltages are affected by the body effect. Thus, has a lower voltage than the expected using the nominal threshold voltages. To reduce , it is necessary to make sure that is large enough to turn transistor on. If the write signal (when set to logic 1) has a voltage higher than , a voltage closer to can be stored in Sb1 (or Sb0). If then (1) and (2) become
With a higher voltage at the XOR node (which is at the gate of transistor ), transistor is able to sink a larger current to discharge the match line.
All the CAM cells with exception of 6-T(P) and 10.5-T cells would benefit from this feature. It should be pointed out that and are p-type transistors in the case of the 6-T(P) cell; this in turn prevents voltage degradation when passing a 1. To get a good logic 0 value, the XOR point should be discharged through transistor . The threshold voltage plays an import role in determining the minimum that allows the cell to operate. In these cells the voltage at XOR node is crucial since this voltage would turn on transistor to discharge the match line. This node would determine the minimum under which the cell can operate.
B. Sense Amplifiers
Sense amplifiers are commonly used in CMOS digital circuits to provide the following functions: amplification, delay reduction, power reduction, and/or signal restoration. Readers are referred to [14] for information about sense amplifiers. To reduce the match line delay (when this line is discharged), a sense amplifier could be used. From our simulations (shown in Fig. 8 ), W-S, 6-T and 6.5-T DCAM cells are strong candidates, since they have a long match delay. It should be pointed out that a system requires one sense amplifier per row. This in turn may require not only more silicon real estate but also power to drive these sense amplifiers.
Other signals that would benefit from using sense amplifier are BIT and NBIT when a read occurs. From our simulations shown in Fig. 7 , it can be observed that 6-T(P) and W-S DCAM cells would need a sense amplifier per column. When a read occurs BIT and NBIT can have both a value of 1 (for all new DDCAM cells); this is the case when a don't care is stored. Thus, both BIT and NBIT columns need a sense amplifier.
The number of match sense amplifiers depends on the number of words (N) the CAM system has. On the other hand, the number of BIT/NBIT sense amplifiers depends on how the CAM system is organized. This number depends on the word ( ) length and the number of words per module ( ); usually memories such as CAM are organized in small modules. If there are modules in a CAM system, we have that the number of sense amplifiers for the match and BIT/NBIT lines are
V. CONCLUDING REMARKS
In this paper, novel DDCAM cells have been presented, their features described and the cells compared to an existing ternary DCAM. In evaluating the cells, two critical operations, match and read, and the current drawn by the cells have been considered. The match line discharge time, read time and the current drawn during these operations were measured and compared to that of W-S DCAM cell and the basic cell in [13] .
Based on simulations we have the following results. 1) Shortest match delay. 7.5-T DDCAM cell has the shortest match delay (89.7 psec). The next shortest delays correspond to 10.5-T and 6-T(P) with 111 and 113 psec, respectively. 2) Shortest read delay. 10.5-T DDCAM cell outperforms all the cells with 161.9 psec. The next closest cell is 7.5-T with 571.5 psec. On 10.5-T cell, the two n-type transistors that discharge BIT/NBIT lines have gate voltages equal to ; this in turn allows a fast discharge of the lines. 3) Match current. The 6-T DDCAM cell requires the least amount of current for the worst case match operation. 4) Read current. 6.5-T DDCAM cell requires the least amount of current when a read occurs. This measurement is closely followed by 6-T cell which has a very similar structure and W-S cell. 5) Delay-current product. 6-T(P) DDCAM cell has the smallest delay-current product for the match operation. 6-T, 7.5-T, 10.5-T and W-S cells have a product that is about three times larger than the one for 6-T(P). These results clearly indicate that there is a tradeoff between performance and current. The choice of a ternary DCAM cell depends on the design constraints and required system capabilities. The current versus delay plots in Fig. 11 and Fig. 12 can be useful in making this choice. It should be pointed out that read may not occur very often as part of the application where a ternary CAM is used. The presented CAM cells use dynamic circuitry that needs be refreshed; thus, read operation is needed.
All CAM cells (including W-S cell) allow users to mask an entire column by setting BIT and NBIT of that column to 0. This feature is used when a partial match is needed. A cell similar to 6.5-T CAM has been implemented as part of a router system [15] . The cell was modified to accommodate simultaneous match and read operations. To achieve this, two additional transistors and separate read and data (BIT/NBIT) buses were included. Architectural and application specific techniques can be used to further improve performance of the proposed cells. Some of these techniques are reported in [16] .
