Abstract: Content-addressable memory (CAM) is a prominent hardware for high-speed lookup search, but consumes larger power. Traditional NOR and NAND match-line (ML) architectures suffer from a short circuit current path sharing and charge sharing respectively during precharge. The recently proposed precharge-free CAM suffers from high search delay and the subsequently proposed self-controlled precharge-free CAM suffers from high power consumption. This paper presents a hybrid self-controlled precharge-free (HSCPF) CAM architecture, which uses a novel charge control circuitry to reduce search delay as well as power consumption. The proposed and existing CAM ML architectures were developed using CMOS 45nm technology node with a supply voltage of 1 V. Simulation results show that the proposed HSCPF CAM-type ML design reduces power consumption and search delay effectively when compared to recent precharge-free CAM-type ML architectural designs.
Introduction
Content-addressable memory (CAM) compares stored lookup table data against search data parallel within a single clock cycle [1, 2] and returns the address of the matched data through match-line sense amplifier (MLSA). This parallel search scheme of CAM surmounts the software-based search algorithms for all the highspeed applications such as radix tree [3] , image processing [4] , 5G communication network [5] , mobile devices [6] , IP routing [7] , gray coding [8] and so on. Parallel hardware activity of CAM exhibits high performance, but consumes large power. Hence, designing the CAM for a reduced power consumption and better performance becomes a challenging task. CAM cells arranged in a single row form one CAM word [9] . Each CAM word is connected to a single match-line (ML) as shown in Figure 1 . Prior to the search operation, all the MLs are precharged to high voltage [10] . During search operation, only a single word is matched with search word and the corresponding ML needs to hold the charge. All the other mismatched MLs will be discharged. This regular precharging and discharging of MLs consume considerable dynamic power in CAM. Here, we briefly review a few techniques for power reduction of CAM. Many works were reported to reduce the switching power consumption involved in precharging the MLs. Some researchers worked on segmenting the word into subwords.
The segments are arranged in parallel and hierarchical architectures in [11] and [12] , respectively. The word is divided into master and slave paths in [13] . In [14] , CAM word is divided into NAND-type and NOR-type search and avoids unnecessary charging and discharging of the second segment even when mismatched. In [15] , ML power is reduced by precomputation. Memory organization of precomputation-based CAM (PB-CAM) consists of parameter extractor (PE), CAM memory, and a smaller parameter memory (PM). PE is a circuit which extracts a parameter (one's count of input word) from the input data. The searching operation is done in two parts. Initially, the extracted parameter is compared with the data in PM. The words in the CAM whose parameters are matched will only be activated for further search. As parameter memory is smaller than CAM memory, the comparisons in the first part are smaller. Similarly, in the second part, the comparisons are made only for the matched data. Hence, PB-CAM exploits the reduction in the number of comparisons and thereby reduces power consumption. The authors in [16] addressed the issue of short circuit (SC) power consumption in conventional NOR CAM and developed a precharge-free (PF) CAM which eliminates SC current path during mismatch condition. PF CAM also avoids charge sharing of NAND CAM cell, but it suffers from degraded performance due to a series chain of MLs. To improve the performance of the PF CAM design further, in [17] , self-controlled precharge (SCPF) CAM ML, where output control is based on the charge at each CAM cell node, was designed. This design was developed to improve the performance, but at the cost of power consumption. In this paper, we tried to overcome the drawbacks of PF CAM and SCPF CAM by designing a new precharge-free CAM ML architecture, hybrid self-controlled precharge-free (HSCPF) CAM, in which output control is based on the charge at each successive two CAM cell nodes.
The rest of the paper is organized as follows: Section 2 explains traditional CAM match-line architectures. Section 3 explains precharge-free CAM match-line architectures. Section 4 proposes HSCPF CAM. The comparison results of power consumption, search delay, and energy-metric between the proposed and conventional CAM architectures for different Process Corners, Monte Carlo simulations are presented in Section 5, and Section 6 concludes the paper.
Traditional CAM architecture
CAM architecture is constructed with an array of memory elements along with comparison circuits. Memory elements can either be volatile or nonvolatile [18] . Generally, 6T static random access memory (SRAM) cell is used to build the memory [19] . NAND-type and NOR-type ML architectures are the two basic comparison circuits [20] .
NOR match-line CAM architecture
CAM cells are connected in parallel to form NOR ML architecture. Four NMOS transistors are required to design comparison circuitry in NOR CAM cell. Gates of M 1 and M 2 transistors are connected in series to differential storage bits D and D bar of SRAM cell as shown in Figure 2 . Gates of transistors M 3 and M 4 are connected to differential search bits SL and SL bar . When precharge control signal ctrl is low, ML precharges to a high voltage through transistor P 1 irrespective of the search input and stored data in the memory. When the ctrl is high during evaluation, the ML output depends on the search input and bits stored in the CAM cells. If all the bits in a row are matched with input search word, then no pull-down path exists for the ML and hence, it retains its precharged value. When the word in a row is not matched with input search word even by one bit, the ML attached to that row will discharge through the pull-down path formed by the mismatched CAM cell. Table 1 shows the truth table of NOR CAM cell for match/miss. NOR ML architecture timing waveform for a miss followed by a match case is shown in Figure 3 . Power consumption in NOR match-line architecture for a clock cycle is given by Eq. (1):
where α nor = switching activity, C M Lnor = ML capacitance, V DDnor = supply.
Delay in NOR ML for a clock cycle is given by Eq. (2):
where D nor = Search delay between ML and ctrl, T Dnor = One transistor delay, t RCnor = ML time constant.
Total time required to complete one clock cycle for NOR ML architecture is given by Eq. (3):
where T N OR = total time, t wr = write time T pre = precharge time, t SL = evaluation time.
D nor is used to find the search delay between the ML and ctrl signal, whereas T N OR is used to find the amount of time required to complete one operation to indicate ML for miss or match. NOR ML CAM architecture offers higher performance but consumes larger power.
NAND match-line CAM architecture
In NAND ML architecture, CAM cells are connected in series. Three NMOS transistors are required to form comparison circuitry in NAND CAM cell. The gates of M 1 and M 2 transistors are connected in series to complementary storage bits D and D bar as shown in Figure 4 . The gate of M 3 transistor is connected to node N. If all the bits in a row are matched with the input search word, then logic 1 is transferred to N nodes of all the CAM cells and the ML attached to that word is connected to ground. If a word in a row is not matched with the input search word, then logic 0 is transferred to node N of all mismatched CAM cells and hence the ML attached to that word starts to charge. In NAND ML architecture, the match indicates low and the miss indicates high. Table 2 shows the truth table of NAND CAM cell for match/miss. NAND ML architecture timing waveform for the match followed by the miss case is shown in Figure 5 .
Power consumption in NAND ML architecture for a clock cycle is given by Eq. (4):
Delay in NAND ML for a clock cycle is given by Eq. (5):
where N = number of transistors.
Total time required to complete one clock cycle for NAND ML architecture is given by (6) T
where T N AN D = total time, t wr = writ time T pre = precharge time, t SL = evaluation time.
NAND ML CAM architecture offers low power consumption but degrades the performance; therefore, NOR ML architecture is preferred over NAND ML architecture. 
Short circuit current in NOR-type CAM
The power consumption of the NOR-type CAM design is possible in two phases: evaluation phase and precharge phase. It is identified that the power consumption of NOR CAM is high due to SC current path in the precharge phase during mismatch condition [16] . Consider a NOR CAM cell as shown in Figure 2 . During the precharge phase, the ctrl signal is low and ML is precharged to V dd through transistor P 1 . In the evaluation phase, the output of ML depends on the search data input. Let us consider that the data stored in CAM cell is 1 such that D = 1 and D bar = 0. If the input search word is also 1, match condition occurs and the ML is isolated from ground as M 1 and M 4 are in cutoff and no short circuit path exists from ML to the ground. However, if the search data input is 0 which is a mismatch condition, the transistors M 2 and M 4 are in saturation and they create a short circuit path from ML to ground. As ML drains from V dd to ground, considerable amount of short circuit current will appear in the circuit. Similarly a short circuit path from ML to the ground through saturated M 1 and M 3 transistors will exist during the mismatch, with a stored 0 in the CAM cell.
Estimation of short circuit current for 4 ×3 NOR-type CAM
Consider a 4 ×3 NOR CAM with the given stored data as shown in Figure 6 . During the precharge phase, all the four MLs (ML 1 to ML 4 ) charge to V dd . Let us assume that, in the evaluation phase, a search word 101 is passed to the NOR CAM memory array. In this case, the second row matches with the input search word. Thus, the ML 2 is isolated from ground while other MLs have at least one mismatch condition marked with a dark line. Thus, the MLs drain from V dd to ground and experience SC current. Table 3 shows the power consumption of NOR CAM cell during the precharge phase for a different match and mismatch conditions. The total contribution of power consumption during the match and mismatch is shown in a power chart in Figure 7 . It is noted that the power consumption is high in the case of mismatch during the precharge phase due to SC current path, whereas in the case of a match, the power consumption is minimal and equal to that in the evaluation phase.
Precharge-free CAM architectures
All the works reported so far concentrated on switching power, whereas PF CAM and SCPF CAM concentrated on short circuit power. It is identified that precharging of ML consumes higher power due to short circuit current in NOR CAM. In [16] , authors have proposed a pre-charge free CAM to reduce this short circuit current. 
PF CAM
The circuit of PF CAM cell is shown in Figure 8 . It consists of a 6T SRAM as a storage cell for writing the data. The NMOS transistors M 8 and M 7 are used as the comparison transistors. PMOS transistor M 10 and NMOS transistor M 9 are used for controlling the ML output for match/miss based on control bit (CB) and charge value on node S. The purpose of CB is to reset the ML segments between two successive searches. This is accomplished by making CB high, which turns on pull-up transistor and drains ML irrespective of search input word and stored word. Furthermore, during the search operation, CB =0, which makes M 10 transistor ON and the pull-down transistor M 9 OFF. If there is a match, high charge value on node S makes ML charge through M 10 transistor. In the case of a mismatch, low charge value on node S makes ML discharge through M 10 transistor. PF CAM architecture formed by cascaded chain of control bits passing through CAM cell is shown in Figure 9 . All the pull-up transistors M 81 ..... M 8N in the ML control circuitry are NMOSFETS except M 80 . If a given input search word is matched with all the CAM cells in a specific row, then all the nodes S 1 to S N will go high and drive transistors, M 80 to M 8N into saturation. This makes the nodes ML 0 to ML N −1 to change and subsequently ML becoming high. Even if one CAM cell in a row mismatches, let us say third CAM cell, ML 2 discharges and M 83 remains in cutoff and the ML discharges to low. It avoids short circuit path and charge-sharing problem and minimizes the overall power. However, due to the cascaded chain of control bits passing through CAM cells, the search operation is delayed significantly. To overcome this problem, SCPF CAM architecture is proposed. 
SCPF CAM
The circuit of SCPF CAM cell is shown in Figure 10 . When prestored data in a CAM cell is not matched with search input, charge value at node S is low and it is passed to ML through transistor M 9 . In the case of a match, ML is driven by high voltage through transistor M 10 . All the SCPF CAM cells are connected in parallel to form ML architecture as shown in Figure 11 . Here, if all the prestored bits in a row are matched with input search word, charge value at all the mismatched nodes is high and ML attached to that row is also charged to high. When the prestored data in a row is not matched with input search word even by one bit, the charge value at all the nodes is low and the ML attached to that row will discharge to ground. By controlling the ML output with charges at the parallel nodes S 1 , S 2 ...S N , SCPF CAM ML structure overcomes the cascade ML structure of PF CAM and thus improves search speed significantly. However, SCPF CAM design utilizes one additional transistor per CAM cell, when compared to PF CAM architecture which is responsible for an increase in power as well as cell area. We propose a hybrid self controlled precharge-free (HSCPF) CAM architecture, which overcomes the search delay problem of PF CAM and power consumption problem of SCPF CAM. 
Proposed HSCPF CAM
The architecture of the proposed HSCPF CAM cell consists of a two 6T SRAM cells, with four NMOS transistors 17 , and M 18 . We use a hybrid charge control circuitry, which controls two consecutive CAM cells. It consists of M 9 and M 10 transistors as shown in Figure 12 . The gates of M 9 and M 10 transistors are connected to node S 1 and the source of M 10 transistor is connected to node S 0 .
• In this charge control circuit, charge values at nodes S 0 and S 1 control the ML output for high or low. If the prestored data in two CAM cells is matched with the search input, the charge value at nodes S 0 and S 1 is high which in turn passes a high value to ML through transistors M 9 and M 10 , else it passes low value. These two CAM cells will be considered a CAM word for further operation.
• All these CAM words are connected in parallel to constitute ML structure as shown in Figure 13 . If the search content matches with the prestored data in the first CAM word, the charge values at the nodes S 0 and S 1 are high. This process continues for the remaining CAM words in the ML structure until nodes S N −1 and S N . If all the prestored bits in a word are matched with the input search word, the ML attached to that word charges to high. In the case where prestored bits in a word are not matched with the input search word even by one bit, the ML attached to that word charges to low. Table 4 shows the truth table of HSCPF CAM cell for match/miss. The representation of timing waveform of HSCPF CAM designs is shown in Figure 14 . Total time required to complete one clock cycle for HSCPF CAM ML architecture is given by Eq. (7) T totpref ree = T wr + T SL ,
where t wr = write time and t SL = evaluation time or search time.
It can be observed from the architecture that HSCPF CAM utilizes two transistors per a two-bit word for the charge control circuitry; hence, the number of transistors utilized for the charge control circuitry is half of that used in SCPF CAM. Therefore, the HSCPF CAM minimizes the area as well as power consumption. The proposed HSCPF CAM overcomes the cascaded ML structure of PF CAM and increased transistor count of SCPF CAM, thereby offering lower energy metric when compared to SCPF CAM and PF CAM.
Simulation results
The proposed HSCPF CAM design of size 8 (words) × 8 (bits) is implemented in the technology node 45-nm using generic process design kit(GPDK) and simulations are performed for validation using Virtuoso tool. Along with the proposed design, basic NAND-and NOR-type ML CAMs, precharge-free CAMs of [16] and [17] of size 8 × 8 are also simulated. These designs are compared with the proposed design for power, search delay, and energy-metric. Partial layout view of the proposed CAM cell design is shown in Figure 15 . This is verified in Cadence Virtuoso for DRC and LVS check. The area of HSCPF CAM design is smaller than those of other conventional CAM designs because the charge control circuitry is shared between two successive CAM cells and also the proposed design uses folding and chaining of transistor in CAM structure during layout design. A 500 run of Monte Carlo (MC) simulations is performed with 3 σ Gaussian distribution by varying the design parameters, process corners, and operating temperature. Figures 16 and 17 show the simulations of performance metric average across 500 MC runs. The search delay and power consumption are 127.28 (pS) and 0.273 (mW), respectively. To estimate the contribution of SC power to the total power, simulations are performed on conventional NOR CAM and proposed HSCPF CAM designs. The match case and mismatch case power consumption of conventional NOR CAM are 0.293 mW and 1.932 mW, respectively and those of HSCPF CAM are 0.207 mW and 0.218 mW, respectively. From these results, we can observe that the conventional NOR CAM consumes more power during mismatch case because of SC. As SC is eliminated in the proposed HSCPF CAM, the power consumed by it during mismatch case is much smaller when compared to the conventional NOR CAM. Hence, we can claim that the proposed design saves 86.98% of total power which is attributed by SC power.
Different process corner simulations like SS (slow NMOS and slow PMOS), FF (fast NMOS and fast PMOS), FS (fast NMOS and slow PMOS), and SF (slow PMOS and fast NMOS) for worst case match followed by miss are evaluated. The best case search delay and power of the proposed design are 29.4 (pS) and 0.168 (mW), respectively, and its worst case search delay and power are 302 (pS) and 0.792 (mW), respectively. Figures 18 and 19 show process corner simulations for search delay and energy metric. The results show that the power consumption and search delay average over process corners is 0.273 (mW) and 133.48 (pS). To validate a CAM design, the simulations are performed on the proposed design for higher order bit length up to 128 bits.
Even for a longer word length, the proposed design functions properly during miss/match case. The simulations performed by varying different lengths for power consumption and energy metric are shown in Table 6 . As the size of the CAM array increases, there is gradual increment in power consumption. However, the energy metric is maintained almost constant for the CAM arrays of different sizes and hence it is evident from the results that the proposed design is efficient in terms of area and energy-metric. Therefore, this design is suitable for constructing CAM with low power and high-speed applications for longer word lengths. 
Conclusion
In this paper a low energy-metric HSCPF CAM design has been presented. The proposed design features a new charge control circuitry which can be shared between two successive CAM cells, thereby considerably reducing the number of transistors required for charge control circuitry. The proposed ML architecture is simulated for different process corners and MC simulations at 45nm technology node CMOS process with 1 V supply voltage. The proposed design significantly reduces the search delay, power, as well as energy metric when compared to PF CAM and SCPF CAM as it has charge control circuitry with less number of transistors than SCPF CAM 
