Abstract-In recent years Nano-electromechanical (NEM) relays have been proposed as promising candidates to complement or replace CMOS technology in ultra-low power applications, due to their zero off-state leakage and abrupt turn on/off behavior. The development of the air gap technology enables the implementation of vertical relays, compatible with the Back-End-of-Line (BEOL) CMOS fabrication processes. In this work, we present the design, implementation, and analysis of integrated sequential logic blocks built with BEOL NEM relays, using custom and commercial modeling and simulation tools. While relay circuits are inevitably slower than transistor counterparts due to the mechanical nature of the operation, we show that the proposed circuits offer more than one order of magnitude saving on energy and area consumption. This is particularly attractive in the Internet of Things (IoT) applications, where the requirements for ultra-low power consumption are significantly stricter than those for computation speed.
I. INTRODUCTION CMOS technology has dominated in the chip industry for the past few decades. Consistent scaling of the transistors, following Moore's law, has resulted in chips with higher performance, lower power consumption, and smaller area. However, the scaling is approaching an end due to the fundamental physical limitations, technological and photolithography related challenges. Increased sub-threshold leakage dissipation further renders power density scaling impossible. The NEM relay, with its zero-leakage property and the abrupt turn on/off behavior, has become a promising alternative technology for Very Large Scale Integration (VLSI) applications. In recent years, many applications such as logic gates, arithmetic units, interface circuits, memory cells, and power management units have been demonstrated [1] [2] [3] [4] . While there have been some initial studies on the design of latches [2, 5] , the full implementation of sequential logic blocks, which are critical elements in any VLSI application, remains to be explored further.
With the introduction of air gap technology in BEOL [6] , the concept of NEM relay implemented in a CMOS compatible BEOL process was presented in [7] . In this work, we use a similar concept and an optimized device, as shown in Fig. 1 , and also utilize relay-specific circuit design techniques for implementation of low power sequential circuits. For modeling, we use our custom Verilog-A model, which is validated against a commercial finite element analysis (FEA) tool, Coventor MEMS+ [8] . For circuit simulations, we use Cadence Virtuoso Spectre [9] .
The remainder of the paper is organized as follows. Section II explains the device structure and operation principles, followed by the modeling. Section III introduces the relay-based logic and its optimal implementation. Section IV proposes widely used sequential circuits including the D and JK flip-flops built with relays and compares them with the CMOS counterparts. Finally, section V discusses the results and presents the conclusion.
II. DEVICE STRUCTURE, OPERATION, AND MODELLING
The four terminals of the relay in Fig. 1 are named after the terminals of the CMOS transistor: gate (G), source (S), drain (D) and body (B). The channel and gate are separated by an inter-metal dielectric (IMD). The relay utilizes BEOL metal (aluminum) layers and Vertical Interconnect Accesses (VIAs). The technological parameters are listed in Table I. The relay is actuated electrostatically by the voltage difference between the gate and body (VGB). When VGB is higher than an inherent threshold value called the pull-in voltage (VPI), the beam "pulls-in" and the source and drain are consequently connected by the channel, as shown in Fig. 2(a) . When reducing VGB, there is a 0.38V hysteresis between the pull-out voltage (VPO) and VPI, as shown in Fig. 2(b) . This is due to the existence of the surface adhesion forces and a relatively strong electrostatic force when the device is pulledin [10] .
The serpentine shaped beam is fixed at M1 and ends at M5 to obtain the maximum spring length, thus the minimum spring constant. This can help in reducing the pull-in voltage (VPI) to its lowest possible value. The fixed body terminal contains three metal layers (M2~M4) and acts as the second gate, or "back gate", as called in MOSFETs. The distances Fig. 1 . BEOL NEM relay 3D schematics and its symbol. The relay consists of the fixed anchor, the movable beam forms gate and channel which are separated by an inter-metal dielectric, the fixed source, drain, and body. between the body electrodes and the corresponding gate layers gradually increase from 1.5F to 1.7F on the M2-M4 layers.
MEMS+ simulates the channel displacement when sweeping the gate voltage while keeping the body connected to ground. The inset of Fig. 2(b) shows the steep switching behavior on the I-V curve from MEMS+ output simulated in Cadence Virtuoso Spectre. In the simulation settings, the beam uses the Bernoulli model, with the adhesion force per unit area set to be 100 kPa.
The relay buffer, as illustrated in Fig. 3(a) , are used for transient analysis of the relay. A piecewise linear (PWL) pulse is applied to the gate, and the channel movement is monitored. The relay shows a mechanical delay (tM) of ~13ns before the channel reaches the source and drain. It presents the same mechanical delay at the buffer output voltage after applying the gate input, as shown in Fig. 3(b) . This delay hence limits the maximum operating frequency of the relay-based applications.
While the FEA tool accurately captures the device behavior, it demands massive computational resources which leads to an extended computing time. This makes the simulation of circuits virtually impossible. Thus, a fast and compact model that closely captures the behavior is needed and preferred for complex circuit simulations.
The relay is generally modeled as a damped mass-spring system with a parallel plate capacitor. When applying a voltage difference between the gate and body, the principle dynamic behavior can be described with Newton's second law of motion:
where m is the effective mass of the beam [11] , b is the damping coefficient of the free moving structure [12] , k is the effective spring constant of the serpentine gate [13] , and y is the displacement of the gate which is constrained in the y-axis only. The electrostatic force * depends on the voltage bias and the position of the beam and is described by
where 5 is the air permittivity constant, ./ is the overlap area between the gate and body, and ; is the original gap between the channel and the source/drain electrodes.
The dispersion forces, including the Casimir force and the Van der Waals force, predominantly affect the pull-in behavior in NEM devices [14, 15] . Furthermore, when the separation between the channel and the contact electrodes is less than a few tens of nanometers, the contact force, 1 , is dominated by the Van der Waals force [15] :
where the H is the Hamaker constant [16] .
Utilizing the governing equations above, we developed a model in the Verilog-A language. The model implements the constraints in a state-space representation form for the Spectre simulator. To validate the results, critical properties are compared in Table II , which indicates that the Verilog-A model accurately captures the device behavior while reducing the computation time needed. The simulations were conducted using a Lenovo P9000 workstation, with moderate accuracy on Spectre settings and the Euler integration algorithm for MEMS+.
III. RELAY LOGIC
The relays serve a similar function as the transistors in digital circuits, namely an on/off switch. However, the switching characteristics of them are significantly different. In a long series stack of logic gates (and/or transistors), the Elmore delay determines the operation speed of the circuit [17, 18] . In relays, on the other hand, the speed is primarily determined by the mechanical delay [19] . Therefore, if the relay circuit can be constructed such that all the gate voltages are applied at the same time, then the operation can be done in the timeframe of only one mechanical delay. This is the primary design constraint that makes relay-based circuit designs different with CMOS designs. As a result, the pass transistor logic style is the preferred choice for most relay logic applications. By implementing the above methodology, the 2-input and 3-input versions of widely used logic gates are built, as presented in Fig. 4 . It should be noted that unlike a MOSFET, a relay can operate as both an N-relay (body connected to GND) or a P-relay (body connected to VDD). All relay logic gate are designed such that they incur only one mechanical delay in the worst-case, and one electrical delay (source-drain pass through), tE, if the signal directly passes to the output. Additionally, the number of devices used in this way is fewer than the conventional CMOS method. As an example, the 3- input AND gate illustrated in Fig. 5(a) is built with four relays, whereas the CMOS approach needs at least eight transistors. The simulated input combinations and output waveforms are shown in Fig. 5(b) .
IV. RELAY-BASED SEQUENTIAL CIRCUITS
The relay logic gates can be used in combinational circuits, in which the output is a direct function of current inputs. Another important class of logic in digital circuits and systems is sequential logic, in which the current state is a function of not only the inputs but also the previous states [20] . In this section, we present the main components of the sequential circuits, including the D flip-flop and the JK flipflop. Fig. 6 (a) depicts a negative edge triggered relay D flip-flop which incorporates the master-slave concept. It operates in the following two phases: When CLK = "1", the first latch is on, and the second one is off. The inverted input D is sampled at node X. The first latch, therefore, acts as the master stage while the second one is the slave stage. During the period when CLK = "1", the master latch is in the evaluation mode, and the slave latch is in the hold mode. When CLK makes a transition from "1" to "0", the mode reverses in both stages, propagating the preserved input D to the output Q. The corresponding waveforms are shown in Fig. 6(b) . The state of the output is determined one mechanical delay after CLK makes the "1" to "0" transition.
A. D Flip-flop

B. JK Flip-Flop
Based on the combinations of the two inputs, the JK flipflop determines to set (JK=10), reset (JK=01), hold (JK=00) or toggle (JK=11) the state. Moreover, the JK flip-flop can be configured as a D flip-flop by setting K to J ̅ , or a T flip-flop by connecting K and J. As a result, the JK flip-flop can be considered a universal flip-flop.
The proposed JK flip-flop implements the relay logic to achieve the optimized structure. It is concatenated by two JK latches to avoid the race condition. The two latches serve as the master (the first) and the slave (the second) stage. Each latch is built with cross-coupled gates, as shown in Fig. 7 . The key design decision here is to put the pass-through input of the NAND gates, B F , in the critical path of both latches, and connect the storage nodes to the "slow" input of the NAND gates (A). For the same reason, CLK is connected to the C input of the 3-input AND gates and CLK JJJJJ is connected to the B input of 2-input AND gates. This way, any additional mechanical delays will be avoided.
As shown in Fig. 8 , the flip-flop is triggered by the negative edge of the clock signal. When CLK is on the rising edge, the new state of storage nodes (X & X F ) is decided based on their previous state and combination of inputs. In the meanwhile, the second stage is in the hold mode. When CLK makes the falling transition, the state of the output nodes gets updated based on a similar procedure. Note that building the JK flip-flop using the relay-optimized logic design method described above leads to only one mechanical delay in the worst-case at the outputs. In cases where the ON or OFF state of relays is already decided by the storage or output nodes in the circuit, no mechanical delay will be incurred and there will be only one electrical (pass-through) delay. Table III compares area, energy consumption per operation (E/op), device count and the delay for the relay and CMOS implementations of the master-slave JK flip-flop. The delay is measured from the edge of the clock to the validation of output Q (tpcq). To make the assessment fair and close to state of the art, we have investigated various CMOS technologies, including the commercialized TSMC 65nm, TSMC 40nm, and general design kits GPDK 45nm. Although the mechanical delay limits the maximum operation frequency of the relay-based logic blocks, the relay circuit consumes dramatically lower energy per operation (>14x reduction) and has a significantly smaller circuit footprint (>9x smaller) compared to all of the CMOS counterparts. Moreover, the number of devices used in the relay master-slave JK flip-flop is 20, compared to 36 for the CMOS implementations.
V. CONCLUSION This paper proposes a design methodology for implementation of ultra-low power, relay-based sequential logic blocks. We have developed an abstract model for BEOL NEM relay which is validated against an FEA tool, Coventor MEMS+. The model accurately captures the device behavior while reducing the simulation time. The proposed relay logic utilizes the unique switching characteristics of the relay to minimize the propagation delay, as well as the number of devices required. The major sequential circuits in digital systems, namely the D and JK flip-flop, are implemented, simulated and compared with CMOS counterparts for different clock frequencies. The proposed approach offers at least one order of magnitude saving on energy and area consumption, which makes it a perfect candidate for the IoT applications, where ultra-low power consumption is a critical requirement, and moderate computation speed is acceptable. 
