Abstract-
I. INTRODUCTION
T HE future Internet of Things (IoT) will require embedded electronics to perform real-time computation on data with high energy-efficiency. For a conventional Von Neumann computer architecture, significant energy and time is spent not only for computing but also for data transfer between a central processing unit (CPU) and off-chip memory [1] . Therefore a novel "in-memory computing" (IMC) architecture that uses a resistive memory (ReRAM) array to implement a look-up table (LUT) was proposed recently to address these challenges [2] , [3] .
Nanometer-scale electro-mechanical memory (NEMory) cells which leverage contact adhesive forces or trapped charges to achieve bistable operation are ideally suited for IMC applications because they have near-zero leakage through nearly infinite resistance ratio between high-resistance (noncontacting) state and low-resistance (contacting) state [4] - [8] , and also because they can be programmed with much lower energy than other non-volatile memory (NVM) devices [5] . It should be noted that a NEMory cell is essentially a reconfigurable interconnection; indeed, the back-end-of-line (BEOL) air-gapped metal wiring layers available in an advanced CMOS process can be used to implement NEMory cells with compact footprint [9] . In this letter, a NEMory-based reconfigurable logic LUT architecture and operating scheme is described and benchmarked against a conventional CMOS LUT architecture ( Fig. 1) II. LUT ARCHITECTURE AND OPERATING SCHEME
The NEMory-based LUT comprises a cross-point memory array with an input (Address) portion and an output (Result) portion, as illustrated in Fig. 2 for a 5-input/2-output LUT; the bit-lines in the input portion (IBLs) are connected to those in the output portion (OBLs) via gated CMOS buffers. The number of columns in the NEMory array is N + M, where N is the number of input bits and M is the number of output bits. Each memory cell comprises a vertically oriented movable electrode that is physically anchored at the bottom to the bitline, and actuation electrodes (PL0 and PL1) on either side of the movable electrode implemented in intermediate metal layers; contacting electrodes (I/O0 and I/O1) on either side of the movable electrode are implemented in a top input/output metal layer. The PL0 and PL1 electrodes (not shown in Fig. 2 , for clarity) are shared across the cells within a single column, and are used to set the state of each NEMory cell via electrostatic actuation to bring it into physical contact with either an I/O0 electrode or an I/O1 electrode. A non-linear device is assumed to be integrated into each NEMory cell, either at the bottom (Schottky contact) or at the top (metalinsulator-metal contact), to prevent sneak leakage paths in the cross-point array.
The number of rows in the NEMory array corresponds to the number of possible input bit combinations (up to 2 N ); each input bit combination and its corresponding answer is programmed in the input portion and output portion, respectively, as follows: one IBL/OBL is grounded at a time to program the cells one row at a time; a programming voltage (V prog ) is applied to the PL0 actuation electrode in each column in which the cell is to be set to the "0" state; then V prog is applied to the PL1 actuation electrode in the other columns to set the remaining cells to the "1" state. Note that the input/output electrodes are electrically floating during a programming 0741-3106 © 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. operation, so that no direct current flows; this "cold-switching" operation provides not only for ultra-low-energy operation but also improved endurance [10] . A lookup operation involves 3 steps as follows: (1) with read enable line (RE) grounded, input lines (I0 and I1) and IBLs are all pre-discharged low (to GND) and output lines (O0 and O1) and OBLs are all pre-charged high (to V DD ) through a PMOSFET in the gated CMOS buffer; (2) the input lines are driven, causing all but one bit line -that is, the one corresponding to the input bit combination -to be charged high, as indicated by the arrows in Fig. 2(a) ; (3) the gated CMOS buffers are enabled (RE = V DD ), causing one of each pair of output lines to discharge toward GND according to the states of the cells connected to the one bit line that remained low, as indicated by the dotted arrows in Fig. 2(a) , so that the result can be detected by the follow-on logic gates. It should be noted that, in principle, the electrical connections in the top electrode layer can be hardwired (vs. programmed) in the input portion of the array if there is no need for customization or reconfigurability. Furthermore, the number of output bits can be increased simply by adding output column(s) to the array, with no impact on computational throughput. Fig. 3(a) shows a partial view of the three-dimensional (3-D) NEMory cell implemented using multiple BEOL metal interconnect layers. The electrode features in the intermediate metal layers are assumed to have width and spacing (actuation gap size) equal to the minimum lithographically defined feature size (F). The as-fabricated contact gap size in the top metal layer is assumed to be F/2, formed using a double-patterning technique such as described in [11] , to ensure that contact is made only in the top layer, i.e. to avoid catastrophic pull-in of the structure to the actuation electrode. It should be noted that the CMOS tri-state gates (cf. Fig. 2 ) are fabricated in underlying layers and therefore do not require much extra layout area.
III. NEMORY CELL DESIGN AND SIMULATION
3-D device simulations using Coventor MEMS+ [12] indicate that the spring restoring force (F spring ) of the movable electrode is a relatively insensitive function of the electrode width (W beam ), as shown in Fig. 3(b) . This is because the vias are the more compliant structural components, which have the greatest influence on F spring . Note that the contact adhesive force (F adh ) must be greater than F spring so that contact is maintained with no actuation voltage applied, i.e. for nonvolatile operation.
The voltage required to reprogram a NEMory cell is investigated herein assuming F = 20 nm, cross-sectional aspect ratio (height/width of a layer feature) equal to 2, and copper (Young's modulus = 110 GPa) metal layers and vias. Fig. 3(c) plots the minimum reprogramming voltage, i.e. for which the electrostatic force (F elec ) plus F spring is equal to F adh . For nanometer-scale contact area, with F adh in the range of a few nN [13] , [14] , the cell can be reprogrammed with less than 10 V. The catastrophic pull-in voltage is found to be approximately 4 times larger than the programming voltage.
IV. PERFORMANCE BENCHMARKING Fig. 4(a) shows the tradeoff between energy and delay for programming a NEMory cell. For the ReRAM array, circuit layout based on [3] , device performance based on [16] , and readout time of 20 ns are assumed.
be seen that miniaturization (smaller actuation and contact gap sizes) is beneficial for reducing both the programming energy and delay. Fig. 4(b) shows the impact of increasing the number of output bits on the energy consumed during a readout operation, and the impact of higher contact resistance (R cont ) on the readout delay. (R cont limits the rate at which a bit line is charged and the rate at which an output line is discharged.) The stored answers can be looked up in less than 1 ns (no matter the number of output bits) using much less than 1 pJ of energy for 4 output bits. The performance characteristics of NEMory-based LUT for 1 output bit are benchmarked against those of ReRAM-based and CMOSbased LUTs in Fig. 5 . Much lower readout energy and delay − as well as zero standby power consumption − are remarkable advantages of the NEMory-based LUT. Practical challenges for implementation include precise control of contacting surface properties (roughness, adhesion energy, etc.) and potential reliability issues, which remain to be experimentally investigated.
V. SUMMARY
A reconfigurable LUT architecture utilizing an array of re-programmable non-volatile NEM memory (NEMory) cells and a novel readout scheme is presented. As compared with ReRAM-based and CMOS-based LUTs, NEMory-based LUT is projected to be 10× faster and 100× more energy-efficient, while achieving similar high density to ReRAM, making it a compelling approach to energy-efficient computing in the era of the Internet of Things.
