Abstract-A Protocol Processing Accelerator based on lowpower NAND-type Ternary Content Addressable Memories (TCAMs) is present in this paper. The fundamental constraint of a network sensor is the energy consumption, the high requirement on clock frequency of the microcontroller for protocol data receiving is one of the primary reasons causing high energy consumption. Protocol processing based on hardware usually has better energy efficiency than software, but the hardware accelerator for networked sensors is difficult to design due to no standard protocols. TCAMs are hardware-based parallel lookup tables with bits masking. Programmable state machine circuits with one clock cycle search operation capability can be designed with TCAMs. A novel decoding logic "dual TCAM" cell is proposed for the design of low-power high performance TCAMs. A general purpose protocol processor is designed with cascade of such TCAM state machines. It can be used to detect any start or end symbols defined by users in systems using Manchester or NRZ encodings, which can greatly improve the energy efficiency of a networked sensor.
I. INTRODUCTION
A networked sensor is a node in a wireless sensor network. It is a device to integrate communication, power sources sensors and actuators with computational elements in a very small physical size.
For a networked sensor, the fundamental constraint is its energy consumption, since it may be impossible to replace its energy source. In a wireless sensor node, the radio consumes a vast majority of the system energy [1] . This power consumption can be reduced through decreasing the radio duty cycle [2] . Increasing bit rate is the primary approach to decrease the radio duty cycle. By increasing the bit rate without increasing the amount of data being transmitted, transmission time decreases and the radio can remain off as much as possible [3] . But high bit rate requires high receiving ability, the microcontroller (MCU) has to run at high frequency, which also may increase the power consumption greatly.
In a system with a fixed protocol, such a problem can be solved by ASIC to cope with low-level data receiving and transmitting, but the wide range of application of wireless sensor networks makes it difficult to develop a single protocol.
Although there is no standard protocol, the design of a general purpose protocol processor based on programmable technology is possible since there are many common characteristics among various protocols. A digital circuit can be divided into two parts, the data path and state machines. If both parts are programmable for typical protocols, the circuit will be general. Our early work on programmable data path circuits was discussed in [4] , In this paper, we focus on the design of programmable state machines.
Programmable state machine circuits with short input data can be implemented with RAM-based Lookup tables as that in FPGAs, but in the case protocol processing, input data which determines state transitions usually very long, the connections will be complicated and usually only part of hardware resources can be used in an application. The ternary content addressable memories (TCAMs) are hardware-based parallel lookup tables with bit-level masking capability invented in recent years [5] . We found the features of TCAMs are very suitable for the design of programmable state machine of protocol processors.
II. TCAM FUNDAMENTALS
TCAMs are developed from binary content addressable memories (CAMs) [6] , they have the features of ordinary static memories (SRAMs), the contents in which can be written and read, and can be implemented with the same process of SRAMs, but they are usually used as a coprocessor to speed up the SEARCH operations. When working at SEARCH mode, the input of CAMs or TCAMs is called "key words", the output will be the address of a RAM which content is matched with the key word. The CAMs can only used for exact match operations, TCAMs support "Don't care" logic, can be used for partly match operations. An example of the SEARCH operation with bit masking is shown in Figure 1 .
TCAM devices can be divided into two groups, the NOR-type and NAND-type. The NOR-types can work at very high speed but suffer from high power consumptions. In spite of having some drawbacks, NAND-type TCAMs are more attractive to be used in networked sensors due to low power consumption. There are two typical structures of NAND-type TCAM cells, shown in Figure 2 (a) and (b) respectively. The SL 1 and SL 2 are SEARCH lines, SL1 = di, SL2=/di in typical applications, where "d i " is one bit of input key word. A ternary bit is emulated by the combination of 2 binary bits. Thus, the TCAM cell can have a value of either "00", "01", "10", and "11". However, for proper operations, only three of them are used in TCAM applications. For TCAM cells shown in figure 2(a) and (b), the definitions of TCAM values are shown in Table 1 . Error! (Not used) 01 "0" 10 "1" 11 "X" (Don't care)
The comparison logic of a NOR-type TCAM cell can be simply summarized as "match-ON, mismatch-OFF". Here "OFF" means it is not conduct from "ML_I" to "ML_O", "match" means input data on SL 1 and SL 2 equal the TCAM cell value. A row of TCAM cells in a NAND-type array is shown in Figure 3 . The TCAM cells are in series to form a "word". Prior to a search operation, the match lines (input of inverters) are pre-charged, so all outputs (MLSOs) are low states. When evaluate, if all cells in a row match input key word, the match line in this row will discharge to low state, and the MLSO will change to high state to indicate a match. Other rows will remain at their pre-charge state since no discharge path. In typical cases, there are only a few words match the input keyword and need pre-charging before next operation, so NAND-type TCAMs have low power consumption.
The architecture of a protocol data receiver is shown in Figure 4 . Here we use typical Manchester Encoding to illustrate the procedure of a protocol data unit receiving. Manchester code is the most popular Physical layer encoding used in networked sensors, it is a synchronous clock encoding technique used to encode the clock and data of a synchronous bit stream. The Manchester encoding data receiving should use 16X clock (the frequency of the clock is 16 times of that of signal transport rate) for synchronization, and every bit should be sampled twice to detect the transition at the center of a "bit cell". The first state machine in Figure 4 is implemented based on a TCAM array and a dual ports SRAM array, which is used to detect the start symbol and other special protocol symbols defined in Physical layer. The "DLL TCAM" is used for high layer protocols. Since "don't' care" bits is supported in a TCAM array, there may be more than one active output signals among the outputs of match line sensing amplifiers, the priority control circuit must be used to determine the only active output. The outputs of "priority circuit" are connected to word lines of the dual-ports RAM, so the content in RAM of matched word can be read out to registers NR,SR and OR. Here NR is number of bits of input data to be matched, when the "bits counter" equal to the value in NR, the state machine is activated and change it's state once. The SR is the state register, the content of which is the present state. The "next" states are stored in DPRAM. The OR is a register of output signals at present state, which are 
, typically, "10" represents "1", "10" represents "1", the "00" and "11" are non-data symbols usually used to define special protocol symbols, such as "start symbol" or "end symbol". The detections of these special symbols are important task of the receiver.
The state machine of a simple Physical layer protocol to illustrate the working principles of the TCAM state machine is shown in Figure 5 .Here the "000" is the "reset state", it will change to "001" state after 16bits data in Manchester encoding received(assume the start symbol is 16bits).The "ST" is the "Start symbol" in Manchester encoding, if the input data match "ST", the state will transition to "010".Here "END" means a "End symbols", "C" is clear flag to reset the states of all registers, "V" is a flag to control the transition of state machine, when "V" is 1, the state will change after every bit is received, otherwise the state can not change until "Bit_counter" equals the value in "NR". The "F" is a flag to indicate that a frame of data is received successfully.
The contents should be written in TCAM and DPRAM to implement this simple protocol is shown in Figure 6 . The "X" in which means all inputs are "don't care" , "ND 1 " is "00X…X", which is one of the invalid data("00" or "11 in Manchester encoding is detect). 16 rows of TCAM cells are required to detect one byte.
Besides the Physical layer protocols, this programmable protocol processor can also supports many Data Link Layer protocols without interrupting the microcontroller, for example, the address matching and PDU type classification. In typical networks, many nodes can receive same messages, although only one or several nodes need to process it, all CPUs have to be invoked to check the address in the PDU, which is one of the reasons for high power consumption. The address matching and PDU type classification are both SEARCH operation and can be implemented with TCAMbased state machines. A message which does not need processing for a node can be rejected by protocol processor without interrupting or making the microprocessor leaving it's low-power mode, the power consumption can be greatly reduced.
The function of TCAM state machine can be easily to proof with a behavioral model, but there are some problems on the performance when implemented with traditional NAND-type TCAM cells. First, the speed is too low to meet the requirements of many protocols due to long charge and discharge path in a row. Second, In order to detect invalid data, 16 rows of TCAM cells have to be used, which take many area and power consumption. Besides, the charge sharing of NAND-type should be considered.
III. A DECODING TYPE TCAM CELL
The traditional complimentary-encoding based design method requires that each TCAM cell processes one bit of the input data, which causes a large number of NAND arrays to be connected in series. If one cell can process several bits of the data, the number of cascade cells will be reduced. In this paper, a decoding-logic based dual-TCAM design method is proposed, allowing two bits to be processed at the same time while a little more cost on the chip area, which can shorten the serial chain of the NAND-type cells. Figure 6 ) to validate the input data.
The charge sharing problem occurs when search inputs have not been static while pre-charging, which can be avoided by timing control. In serial data receiving systems, the frequency of clock is higher than that of the signal transport rate, there are some clock cycles before the next bit come, we can load the data from shift register to the input register one clock cycle before the state machine start, the search line will be static when pre-charge and evaluate, thus, the charge sharing can be avoided.
IV. SIMULATION RESULTS
A state machine circuit in which the size of TCAM array is 16 20, the size of DPRAM is 16 8, implemented with TSMC0.18 m process is used to evaluated the performance. Primary simulation results of HSPICE are listed in Table 2 . The size of all transistors in TCAM and DPRAM array are smallest sizes in MOSIS design rules (W/L=0.27 m/0.18 m). The second state machine has similar structures, but the sizes of TCAM and DPRAM are different. The average power consumption will be very small in typical applications since in most of cases the state machine is started only when a byte of PDU is received.
V. CONCLUSIONS
A low-power protocol processor based on NAND-type TCAM is present in this paper. This protocol processor can be used for networked sensors to process the PDU receiving tasks with much less energy consumption than typical microcontrollers. A novel decoding logic "dual-TCAM" cell is proposed, which can increase the speed of NAND-type TCAM and reduce the requirements on the number of TCAM cells when Manchester encoding is used. The state machine of the protocol processor is programmable, it is possible to be used in a system on chip to process application specific protocols, by which the microcontroller can work in idle state for longer time to reduce power consumption. 
