# NEMsCAM: A Novel CAM Cell based on Nano-Electro-Mechanical Switch and CMOS for Energy Efficient TLBs

Azam Seyedi<sup>1,2</sup>, Vasileios Karakostas<sup>1,2</sup>, Stefan Cosemans<sup>3</sup>, Adrian Cristal<sup>1,2,5</sup>, Mario Nemirovsky<sup>1,4</sup>, Osman Unsal<sup>1</sup> <sup>1</sup>Barcelona Supercomputing Center <sup>2</sup>Universitat Politecnica de Catalunya <sup>3</sup>IMEC <sup>4</sup>ICREA <sup>5</sup>IIIA-CSIC {azam.seyedi, vasilis.karakostas, adrian.cristal, mario.nemirovsky, osman.unsal}@bsc.es, cosemans@imec.be

Abstract—In this paper we propose a novel Content Addressable Memory (CAM) cell, NEMsCAM, based on both Nanoelectro-mechanical (NEM) switches and CMOS technologies. The memory component of the proposed CAM cell is designed with two complementary non-volatile NEM switches and located on top of the CMOS-based comparison component. As a use case for the NEMsCAM cell, we design first-level data and instruction Translation Lookaside Buffers (TLBs) with 16nm CMOS technology at 2GHz. The simulations show that the NEMsCAM TLB reduces the energy consumption per search operation (by 27%), write operation (by 41.9%) and standby mode (by 53.9%), and the area (by 40.5%) compared to a CMOSonly TLB with minimal performance overhead.

#### I. INTRODUCTION

Nano-electro-mechanical (NEM) switches have been suggested as a promising candidate for replacing the CMOS technology [6]. NEM switches provide some unique characteristics which are not available in conventional MOS, such as near-zero leakage current and infinite subthreshold slope. Such characteristics make NEMs ideal for designing highly energy-efficient structures. However, NEMs have relatively long mechanical switching delay [6] compared to the intrinsic delay of CMOS devices, and to this date, they suffer from low endurance [15]. To get the best of both worlds, researchers have combined NEMs and CMOS to build low-power and high performance circuits for critical components [9], [10].

On the other hand, Content Addressable Memories (CAM) have been widely adapted for applications that depend on fully-associative and high-speed search operations, such as translation lookaside buffers (TLBs) and network routers [25]. Since the search operation requires fully-parallel and fast comparisons, CAMs introduce high energy consumption and area constraints. Previous works have explored the design of CAMs with emerging memory technologies to mitigate these issues [14], [23], [24], [27], [31]. However, those CAMs suffer mainly from increased search latency due to the employed technology which prevents them from building performance critical structures such as TLBs.

In this paper we propose a novel CAM cell based on NEMs and CMOS, called NEMsCAM. The memory part of the cell is constructed with two vertical complementary nonvolatile NEM switches, while the comparison circuits are designed with CMOS transistors to allow fast search operation.

Since NEMs can be fully integrated with CMOS devices [9], we locate them on top of the CMOS device layer reducing considerably the layout area and the energy consumption. To our knowledge, we are the first to design such a CAM cell.

As a use case we leverage the NEMsCAM cell to build fully associative TLBs. The TLB has been pointed out as a critical component of energy and performance in modern processors [28]. We design first-level TLBs with 16nm technology at 2GHz for both data and instruction accesses that complete the search operation in one clock cycle. Our analysis shows that the NEMsCAM DTLB reduces the energy per search, write operation and standby mode by 27%, 41.9% and 53.9% respectively, and the area by 40.5% compared to a CMOSonly TLB. Furthermore, the NEMs' increased write latency introduces minimal performance overhead (0.27% on average). The main contributions of this paper are:

- We design the NEMsCAM cell based on complementary non-volatile NEM switches and CMOS transistors.
- We design highly energy efficient first-level TLBs for data and instruction accesses based on the NEMsCAM cell.
- We evaluate the proposed designs at both circuit and system level, and compare with CMOS-only TLBs.

# II. BACKGROUND

In this section we first present NEM switches and the prior art in their use as memory. Then we describe CMOS-only CAM cells and fully-associative TLB structures.

**NEM switches.** A NEM switch is a device in which a mechanically moving part makes or breaks a conductive path, typically in response to a voltage difference applied over its terminals. There are many different implementations of NEM switches [18], [22], [26]. Figure 1a illustrates the geometry of the 5T vertical NEM switch that we consider in this work [26]. A suspended beam is anchored at the source. Two gate terminals (Gate1 and Gate2) are positioned in close proximity to the beam. The beam can connect to one of two output nodes (Drain1 and Drain2). A voltage difference between Gate1/Gate2 and the beam causes the beam to move towards Gate1/Gate2 because of the electrostatic attraction (Figure 1b). The beam will then connect to Drain1/Drain2, creating a conductive path between the source and this drain. An advantage of using two gates is that the electrostatic

force can be used as both pull-out and pull-in voltages of the beam. Hence, one does not have to rely only on the elastic restoring force of the beam. This significantly improves the operational margins and the scalability of the device. Note that the mechanical movement of the beam is fairly slow, depending on the device technology. Consequently, write operations in NEM switches take multiple clock cycles [11].

**Non-volatile NEM switches.** For our application, the NEM switches must exhibit non-volatile behavior: once they are connected to a drain, they must stay in that position until the beam is pulled out by electrostatic forces from the opposite gate. To achieve this we use the NEM switch described in [12]. Figure 1c shows the two stable states of this NEM switch. As long as both gates are at the same potential (wordline WL=0), the beam will never suffer a net disturbing electrostatic force. Figure 1d illustrates the write operation for this switch.

Memory arrays based on NEM switches. Previous studies have looked into using NEMs for memory applications [7], [11], [16], [29], [30], [21], [32]. Chong et al. [11] replace the two pull-down transistors in a 6T SRAM cell with NEM switches to reduce leakage and area. Some of these studies also discuss non-volatile memory arrays [16], [21], [29], [30]. The memory array structure disclosed in [30] is of particular interest to our proposal as we explain in Section III.

**Configuration elements based on NEM switches.** NEM switches have been proposed for configuration tasks. Dong et al. [13] used 3T NEM switches as configuration memory elements in FPGAs, replacing a routing switch by one NEMs, or a LUT cell by two NEMs. That structure could potentially be used for designing a CAM cell, however it exhibits several shortcomings. It suffers half-select conditions and relies only on the elastic restoring force for pull-out, it is volatile, and it outputs only Out, not the complementary OutB. The configuration of the storage part of NEMsCAM which we present in Section III addresses these shortcomings.

**Content Addressable Memory.** A CAM compares the input search data with all of its stored data in fully-associative mode and returns the address of the matching location [25]. A CMOS-only CAM incorporates an SRAM cell to store the data bit and additional XOR circuits to compare the stored bit with the search data [25]. CAMs are responsible for high energy consumption but are also critical for applications that require high search speeds.

**Translation Lookaside Buffer.** A common employment of CAMs is in the TLB that holds recently used virtual-to-physical translations [19]. The processor searches the TLB on every memory operation. In case of a hit, the TLB returns the physical address, and the memory operation proceeds. In case of a miss, the operation stalls until the translation is retrieved from the memory which might take tens of cycles [20].

# III. DESIGN OF NEMSCAM CELL

In this section we present the circuit details of our proposed NEMsCAM cell. We use the memory structure proposed in [30] to implement the storage part of NEMsCAM. That



**Fig. 1:** (a) Schematic of a 5T NEM switch. (b)  $V_{GS1}$  ( $V_{Gate1}$  -  $V_{Source}$ ) is 0 and  $V_{GS2}$  is 1, so the beam collapses to Drain<sub>2</sub>. (c) The non-volatile NEM assumed in this work has two stable states. (d) Biasing scheme used for writing the non-volatile NEM switching. WL selects the cells for writing, while the BL determines the value that will be written.

memory structure provides non-volatility and full-select behavior which are essential to design a CAM cell; it also uses electrostatic pull-out and pull-in and does not require a cell selector device in the write path. The non-volatile memory design is based on the NEM switch described in [12] which has the ability to eliminate the net-disturbing electrostatic force.

Figure 2a shows the schematic of the NEMsCAM cell. The outputs, Out and OutB, are connected to the transistors of the comparison circuit. We choose CMOS for the comparison circuit to avoid the long delay of the NEMs that occurs due to the mechanical movement of the beams, and that may slow down the search operation.

#### A. Circuit Operations

Write biasing scheme. Figure 2b shows how the storage part of the NEMsCAM cell is written. Once the wordline (WL) is activated, all beams on this row are sensitized. For those columns whose cells are to be programmed to 0, the bitlines are set to high (BL=BLB=1), while for those columns whose cells are to be programmed to 1, BL=BLB=0 is applied. No cells suffer half-disturb conditions, and there is also no risk of short-circuit current running through the switches, as BL and BLB are always at the same potential during beam



**Fig. 2:** (a) Schematic of the proposed NEMsCAM cell. (b) NEMsCAM storage array organization (two vertical 5T NEMs) and writing scheme (c) When the NEMsCAM cell content matches the value being searched, there is no discharge path through the cell. (d) In case of a mismatch, the cell tries to discharge the match line ML.



**Fig. 3:** Three-dimensional view of two adjacent NEMsCAM cells in a CAM array.

switching. This is important because high currents through contact between the beam and the drain can be a source of device failure. During normal operation, BL is put at 1, and BLB at 0. Cells whose beams are in state 0 hence have Out=0 and OutB=1. Note that there is no separate read operation in this NEM memory design because there is no mechanical switch latency in the read path.

**Search operation.** Figures 2c and 2d illustrate a cell that matches/mismatches the search data respectively. If a cell(s) that is connected to a wordline causes a mismatch, that cell discharges the entire ML which is associated with that wordline, indicating an overall mismatch.



## B. Cell Architecture

Figure 3 presents the three-dimensional view of two adjacent NEMsCAM cells located in the same column index of the CAM array. Since NEMs have the potential to be fully integrated with CMOS devices [9], we place them on top of the CMOS layer and substantially reduce the layout area. The SL wires run parallel to the BL wires, while the matchlines (ML) and wordlines (WL) are orthogonal to the BLs. By employing vertical NEM switches [26], the requirement of a long beam has little impact on the layout area, as it is out-of-plane. Two Gate1s are aligned and connected to their corresponding WL while the two Gate2s are connected to 0. The drains are connected from the opposite sides and form a cross shape. Note that the BL and WL wires can be merged with the actual device terminals, resulting in a compact layout. Finally, the Vias connect Out and OutB to the CMOS layer which is located below the NEMs. Due to this organization, our proposed NEMsCAM cell reduces the wire lengths which considerably reduces the energy consumption along with the near-zero leakage characteristic of NEMs.

# IV. A USE CASE FOR NEMSCAM: TLB

Due to the criticality of the TLB in the system's performance, processor vendors have employed a two-level TLB organization [5]. The first-level TLB is small, fully-associative and features a very fast search operation, while the secondlevel TLB is larger and aims at holding more translations. To boost the system's performance further, processors provide separate TLBs for data and instructions [5].

The TLB hierarchy accounts for an important percentage of the energy spent on chip [1], [3]. Intel recently reported that 13% of the total core power comes from the TLBs for memory-intensive workloads [28]. Based on our evaluation infrastructure (Section V-A), we find that the TLB energy is overwhelmingly dominated by the first-level TLBs in terms of accesses across the TLB hierarchy (Figure 5). Moreover, by breaking down the energy consumption in the first-level TLBs, we find that the CAM part contributes by 94%. To reduce this source of energy consumption without affecting the performance, we leverage the NEMsCAM cell to design a highly energy-efficient first-level TLB.

# A. Design

We design the NEMsCAM TLB with our proposed CAM cell and with typical SRAM circuits (Figure 4). The CAM



Fig. 4: The circuit detail of (a) the proposed NEM-CMOS CAM and (b) typical SRAM architecture in the proposed TLB structure.

part (Figure 4a) consists of the NEMsCAM cells and the necessary peripheral circuitry optimized for both search and write operations. Similarly, the SRAM cells (Figure 4b) and the associated circuits are designed with CMOS technology. The control signal unit consists of the necessary inverter chains that generate the signals to control the TLB circuits so that the search and the write operations are performed correctly. The address decoder, the write circuits, and the data-in drivers are used only for the write operation; however, the rest of the circuits are designed to be used during the search operation as well. BL and BLB are driven with predefined signals according to the operations. The control circuit unit is added to generate the necessary Gate1 and Gate2 signals during the search and write operations.

**Search Operation.** As presented in Figure 4a, we choose the current-race scheme among various matchline sensing techniques due to its simplicity and the average-low ML energy consumption [25]. During the search operation,  $WL_{en3}$  is high and the ML lines are connected to the WLM lines. At the beginning of each search operation, all MLs are put temporarily in the precharged state as in a CMOS-only TLB. The search cycle starts when the precharge signal ( $ML_{pre}$ ) is high driving the ML to low. Concurrently, the SLs are charged to their data value. After this, the ENB signal becomes low and supplies the ML with the current source.

During the evaluation phase, the stored bits of the CAM cells are compared against the data provided on the corresponding SLs. In case of a match (TLB hit) the current source enabled by ENB pulls ML up and the ML voltage changes to high state. Alternatively, in case of a mismatch (TLB miss) the cell(s) that cause a mismatch counteract the current source and keep ML close to ground level. Finally, the ML sense amplifiers feed the wordline buffers mapping the match location to its corresponding encoded address as stored in the SRAM cells (Figure 4b). Figure 8 summarizes the signal behavior of matching case for a cell of the NEMsCAM TLB.

Write Operation. During the write operation, the  $WL_{en1}$ 

| Processor                                        | 2GHz, 4-way issue, 128 entries ROB<br>Pentium-M branch predictor, 8 cycles penalty<br>32KB, 8-way, 4 cycles latency<br>32KB, 4-way, 4 cycles latency<br>256KB, 8-way, 8 cycles latency<br>8MB, 16-way, 30 cycles latency |  |  |  |
|--------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|
| L1-D cache<br>L1-I cache<br>L2 cache<br>L3 cache |                                                                                                                                                                                                                          |  |  |  |
| DTLB (L1)                                        | 64 entries, fully-assoc, 1 cycle per search operation, 2 / 6 cycles per write operation for CMOS / NEMsCAN                                                                                                               |  |  |  |
| ITLB (L1)                                        | 48 entries, fully-assoc, 1 cycle per search operation<br>2 / 6 cycles per write operation for CMOS / NEMsC/                                                                                                              |  |  |  |
| L2-DTLB<br>L2-ITLB                               | 1024 entries, 4-way, 7 cycle latency<br>512 entries, 4-way, 7 cycle latency                                                                                                                                              |  |  |  |

TABLE I: Simulated system characteristics.



**Fig. 6:** (a) Schematic of the NEM memory cell while biased for programming 1 (beam position depicts the situation after programming is complete). (b) Its simple switch model. (c) A NEM Verilog-A model between BL (source), Out (drain) and WL nodes which we employ in our circuit analysis [11].

and  $WL_{en2}$  signals are high, the WL address is routed to the CAM and SRAM parts, and the data is written into the corresponding cells.

#### V. EXPERIMENTAL EVALUATION

In this section we first describe our methodology to evaluate the NEMsCAM TLB and then we present the results.

| Structure               | Metric                                                                          | CMOS                             | NEMs                             | Benefit (%)                  |
|-------------------------|---------------------------------------------------------------------------------|----------------------------------|----------------------------------|------------------------------|
| DTLB (L1)<br>64 entries | Search oper. (pJ)<br>Write oper. (pJ)<br>Standby mode (pJ)<br>Area (normalized) | 4.529<br>0.148<br>0.141<br>1.000 | 3.308<br>0.086<br>0.065<br>0.595 | 27.0<br>41.9<br>53.9<br>40.5 |
| ITLB (L1)<br>48 entries | Search oper. (pJ)<br>Write oper. (pJ)<br>Standby mode (pJ)<br>Area (normalized) | 3.658<br>0.187<br>0.106<br>1.000 | 2.805<br>0.107<br>0.046<br>0.661 | 23.3<br>42.8<br>56.6<br>33.9 |

**TABLE II:** Energy consumption for search, write operations and standby mode, and normalized area footprint.

# A. Methodology

We design NEMsCAM-based TLBs for data (DTLB) and instruction (ITLB) accesses based on [25] using the TLB organization of a modern AMD processor [5] (Table I). For both NEMsCAM and CMOS-only TLB, we construct the transistor level netlists with all the necessary circuitries, equivalent capacitances and resistance of wires. We simulate and optimize both TLB structures with Cadence Spectre using 16nm Predictive Technology Model [2] at T=25°C targeting 2GHz processor frequency. For the NEM switches we use a simple Verilog-A model explained in Figure 6 with the following parameters: Vpi=0.8V, Vpo=0.2V, Cgs-off =15aF, Cgs-on=20aF, tmech=3ns [11]. We optimize our circuits to minimize the energy consumption. We verified that the search and write operations are executed correctly fulfilling the timing requirements. We also calculate the energy consumption per search and write operation, and standby mode. Furthermore, we draw the layouts [4], and measure the wire lengths and optimize the wire capacitances in the netlists. Finally, to estimate the impact of the NEMsCAM TLBs at system's performance, we use the Sniper simulator [8] with the configuration of Table I with 4KB pages, and run the TLB-intensive workloads from Spec2006 with the reference input set and execute for one billion instructions.

## B. Results

**Energy & Area.** Table II shows the simulation results for both NEMsCAM and CMOS-only TLBs. First, we observe that the area reduces by 40.5% for the DTLB (Figure 7). The reason for this improvement is thanks to the novel structure of the NEMsCAM cell. Second, we observe that the energy per search, write operation, and standby mode reduces by 27%, 41.9%, and 53.9% respectively for the DTLB. This happens due to the lower dimensions of the circuit leading to lower parasitic wire capacitances and resistances on the searchlines and the matchlines which in turn require fewer driving buffers. Moreover, the energy consumption further reduces due to the near-zero leakage current that NEMs provide. Similar results hold for the ITLB as well.

Latency. Figure 8 shows the simulation waveform of the matching state for a cell of the NEMsCAM DTLB during the search operation. The waveform verifies that our design meets the target time requirement of one clock cycle per search operation. On the other hand, the write operation takes 6 cycles in the NEMsCAM TLB (based on [11]) while it takes 2 cycles in the CMOS-only TLB. This slowdown is due to the long



**Fig. 7:** Layout of the DTLB implemented with CMOS-only CAM cells (top), and NEM-based CAM cells (bottom).



**Fig. 8:** Simulation waveform of matching state for the first cell at the last row of the NEMsCAM DTLB.

mechanical delay of the NEM switches. However, this latency barely affects the processor performance as shown next.

**System.** Figure 9 shows the energy reduction in the firstlevel TLBs due to the employment of NEMsCAM for various workloads. We find that the search operation dominates in the energy breakdown for both DTLB and ITLB, and that the NEMsCAM TLBs reduce the energy spent by 28.7% on average. Taking into account that 13% of the total core power may come from the TLBs [28], the NEMsCAM cell can significantly help in reducing the total chip's energy efficiency.

Figure 10 shows the estimated execution overhead due to the employment of the NEMsCAM TLBs. This overhead comes from the increased latency of the write operation in the NEMsCAM cell. However, the write operation: (i) takes place only after TLB misses which occur rarely compared to TLB hits, and (ii) adds latency to an already slow operation, i.e., L2-TLB access (~7 cycles [17]) including potentially the penalty of L2-TLB miss (several tens of cycles [20]). Consequently, the NEMsCAM TLBs have negligible impact on the execution time for most workloads (0.32% on average) while reducing significantly the energy spent on the TLB hierarchy.

#### VI. RELATED WORK

In this section we focus our discussion on leveraging NEMs compared to other state of the art technologies in order to design a CAM cell for high performance components such as first-level TLBs. Eshraghian et al. proposed a CAM design based on memristors [14]. The memristor provides high density capability. However, it suffers from increased search latency—compared with NEMs—and active energy consumption because it requires a higher voltage for search and write operations. Rajendran et al. designed a CAM cell with PCRAM [27]. PCRAM exhibits relatively high area density but, similar to memristors, at the cost of higher search latency, higher write energy consumption, and lower endurance. CAM cells based on tunnel junction (MTJ) devices



Fig. 9: Energy consumption of the CMOS-only and the NEMsCAM DTLB and ITLB.



Fig. 10: Execution time overhead due to NEMsCAM TLBs.

have been also proposed [23], [24], [31]. These cells provide high density thanks to the ability of MTJs to stack over the MOS device, similar to our NEMsCAM cell. However, MTJs suffer from high write power consumption and difficulties in scalability compared to NEMs. In contrast to these proposals, NEMsCAM exhibits high improvement in terms of low search latency, area, and off-state leakage current thanks to the outstanding characteristics of NEMs allowing the design of CAM cells for performance critical structures.

## VII. CONCLUSIONS & FUTURE WORK

In this paper we presented the NEMsCAM cell and designed the NEMsCAM TLB. Our approach exhibits significant benefits in terms of energy consumption and area compared to a CMOS-only TLB. However, the limited write endurance of current NEMs may delay their adoption until technology improves. As future work, we will investigate techniques to improve the lifetime under near-future technology constraints.

## ACKNOWLEDGEMENTS

We thank all anonymous reviewers for their insightful comments. This work is supported in part by the European Union (FEDER funds) under contract TIN2012-34557, and the European Union's Seventh Framework Programme (FP7/2007-2013) under the ParaDIME project (GA no. 318693)

#### REFERENCES

- "Intel Strongarm Processor," www.intel.com/design/pca/ applicationsprocessors/1110\_brf.htm.
- [2] "Predictive Technology Model," http://ptm.asu.edu/.
- [3] "Sh-3 RISC Processor family," http://www.hitachieu.com/hel/ecg/products/micro/32bit/sh\_3.html.
- [4] "The Electric VLSI Design System," http://www.staticfreesoft.com.
- [5] Advance Micro Devices, Software Optimization Guide for AMD Family 15h Processors, 2014, no. 47414.
- [6] K. Akarvardar et al., "Design Considerations for Complementary Nano-Electro-Mechanical Logic Gates," in IEDM, 2007.

- [7] K. Akarvardar *et al.*, "Ultralow Voltage Crossbar Nonvolatile Memory Based on Energy-Reversible NEM Switches," *Electron Device Letters*, vol. 30, no. 6, 2009.
- [8] T. E. Carlson *et al.*, "Sniper: Exploring the Level of Abstraction for Scalable and Accurate Parallel Multi-core Simulation," in SC, 2011.
- [9] C. Chen et al., "Efficient FPGAs Using Nano electromechanical Relays," in FPGA, 2010.
- [10] F. Chen *et al.*, "Integrated Circuit Design with NEM Relays." in *ICCAD*, 2008.
- [11] S. Chong *et al.*, "Nanoelectromechanical (NEM) Relays Integrated with CMOS SRAM for Improved Stability and Low leakage," in *ICCAD*, 2009.
- [12] S. Cosemans, "Data Storage Cell and Memory Arrangement," 2013, European Patent App. EP13198871.9.
- [13] C. Dong et al., "Architecture and Performance Evaluation of 3D CMOS-NEM FPGA." in SLIP, 2011.
- [14] K. Eshraghian *et al.*, "Memristor MOS Content Addressable Memory (MCAM): Hybrid Architecture for Future High Performance Search Engines," *TVLSI*, vol. 19, no. 8, 2011.
- [15] R. Gaddi et al., "Reliability and Performance Characterization of a MEMS-based Non-volatile Switch," in IRPS, 2011.
- [16] J. Gopal *et al.*, "A Cantilever-Based NEM Nonvolatile Memory Utilizing Electrostatic Actuation and Vibrational Deactuation for High-Temperature Operation," *Trans. on Electron Devices*, vol. 61, 2014.
- [17] Intel Corporation, Intel<sup>®</sup> 64 and IA-32 Architectures Optimization Reference Manual, April 2012, no. 248966-026.
- [18] A. Ionescu *et al.*, "Modeling and Design of a Low-Voltage SOI Suspended-Gate MOSFET (SG-MOSFET) with a Metal-Over-Gate Architecture," in *ISQED*, 2002.
- [19] B. L. Jacob *et al.*, "A Look at Several Memory Management Units, TLBrefill Mechanisms, and Page Table organizations," in *ASPLOS*, 1998.
- [20] V. Karakostas *et al.*, "Performance Analysis of the Memory Management Unit under Scale-out Workloads," in *IISWC*, 2014.
- [21] M.-S. Kim et al., "NEMS Switch with 30 nm Thick Beam and 20 nm High Air Gap for High Density Non-Volatile Memory Applications," in *ISDRS*, 2007.
- [22] J. Kinaret et al., "A Carbon Nanotube Based Nanorelay," Appl. Phys. Lett., 2003.
- [23] S. Matsunaga *et al.*, "Quaternary 1T-2MTJ Cell Circuit for a High-Density and a High-Throughput Nonvolatile Bit-Serial CAM," in *ISMVL*, 2012.
- [24] R. Nebashi *et al.*, "A Content Addressable Memory Using Magnetic Domain Wall Motion Cells," in *VLSIC*, 2011.
- [25] K. Pagiamtzis *et al.*, "Content-Addressable Memory (CAM) Circuits and Architectures: A Tutorial and Survey," *JSSCC*, vol. 41, no. 3, 2006.
- [26] R. Parsa et al., "Laterally Actuated Platinum-Coated Polysilicon NEM Relays," in *JMEMS*, Vol. 22, No. 3, 2013.
- [27] B. Rajendran *et al.*, "Demonstration of CAM and TCAM Using Phase Change Devices," in *IMW*, 2011.
- [28] A. Sodani, "Race to Exascale: Opportunities and Challenges," in *MICRO Keynote*, 2011.
- [29] R. Vaddi et al., "Design and Scalability of a Memory Array Utilizing Anchor-Free Nanoelectromechanical Nonvolatile Memory Device," *Electron Device Letters, IEEE*, vol. 33, no. 9, 2012.
- [30] R. Van Kampen, "Four-terminal Multiple-time Programmable Memory Bitcell and Array Architecture," 2009, US Patent App. 12/433,027.
- [31] W. Wang, "Magnetic Content Addressable Memory Design for Wide Array Structure," *Trans. on Magnetics*, vol. 47, no. 10, 2011.
- [32] W. Young Choi et al., "Compact Nano-Electro-Mechanical Non-Volatile Memory (NEMory) for 3D Integration," in *IEDM*, 2007.