INTRODUCTION
SRAM has been the industry workhorse for on-chip embedded memory due to its high performance. In the past, on-chip caches have been steadily increasing in density to accommodate the growing demands for high-performance computing. In order to maintain this historical growth in memory density, SRAM bit cells have been aggressively scaled down physically for every generation along the semiconductor technology roadmap. However, there has been a slowdown in SRAM area scaling from 50% to 30% reduction per generation [Smith et al. 2012] due to several challenges such as increased leakage and variability at nanoscale [Itoh 2011; Qazi et al. 2011] . This calls for new concepts and technological improvements to meet growing performance demands.
One such concept is to use memory cells which have more than two stable states, as shown in Figure 1 . This provides a new dimension for scaling and can potentially overcome the challenges associated with physical downscaling at nanoscale. In addition, it may provide power benefits per bit, since the power cost associated with each physical cell is amortized over multiple bits. Emerging nanoscale materials like graphene, as well as unique material interactions between novel device structures, can enable the implementation of such unconventional circuits. They can potentially lead to low-power ultra-dense nanoscale memories which cannot be achieved by relying on physical scaling alone.
Graphene is a two-dimensional layer of carbon atoms and is considered to be a potential candidate for post-Si nanoscale computing systems [ITRS 2015] . It exhibits extra-ordinary electrical and thermal properties featuring Dirac fermion [Novoselov et al. 2005 ] with very high conductivity [Ando 2007 ] and extreme scalability. Its planar structure also potentially makes it compatible with current CMOS fabrication processes [de Heer et al. 2007 ]. While graphene-based transistors have been proposed Iannaccone 2009a, 2009b; Banerjee et al. 2009] , challenges still exist which preclude their use in digital systems [Schwierz 2010] . Novel device structures with unique characteristics have been recently explored, such as the bi-layer graphene nanoribbon crossbar tunneling device (xGNR) Lake 2011, 2012; Habib et al. 2013 ] which exhibits negative differential resistance (NDR). This xGNR device has potential applications in multistate logic and memory circuits.
Multistate circuits using NDR-based resonant tunneling diodes (RTDs) have been extensively researched in the past [Wei and Lin 1991; van der Wagt 1999; Lin 1994] . However, RTDs were implemented using non-lithographic processes which were expensive and incompatible with those for Si, which prohibited their integration with conventional technology [Jha and Chen 2001] . On the other hand, graphene-based devices like xGNR can potentially overcome such integration challenges and may be used in mainstream applications.
Our previous work has explored a binary memory circuit using this xGNR device [Khasanvis et al. 2011 ] which could also function as ternary memory [Khasanvis et al. 2012 ] but did not scale further. In this article, we present a scaling approach that is different from physical scaling, where the number of bits stored in a single cell can be increased. We show a new quaternary memory circuit as baseline using the xGNR device, called quaternary graphene nanoribbon tunneling random access memory (GNTRAM). A heterogeneous graphene-CMOS circuit implementation is used for access and control. Extensive benchmarking against state-of-the-art 16nm CMOS SRAM and 3T DRAM memory cells is also presented. Our evaluations show that the quaternary GNTRAM has up to 2.27× density-per-bit benefit against CMOS SRAMs and 1.8× benefit against 3T DRAM in 16nm technology node. It is also up to 818× more power efficient per bit when compared against the high-performance CMOS designs in idle periods, while having comparable performance. Even further improvements may be possible by using graphene more extensively instead of silicon MOSFETs, as advances are made in graphene technology.
The rest of the article is organized as follows. Section 2 presents an overview of the xGNR device and latch configuration. The proposed scaling approach is presented in Section 3. Section 4 details the quaternary GNTRAM design and its operation. Section 5 discusses the leakage analysis and mitigation in GNTRAM, followed by physical implementation description in Section 6. Methodology and benchmarking are presented in Section 7 and conclusion is in Section 8.
BACKGROUND AND PREVIOUS WORK

Graphene Nanoribbon Crossbar (xGNR)
The graphene nanoribbon crossbar (Figure 2(a) ) is a two-terminal device. It consists of two semi-infinite, H-passivated armchair-type GNRs (AGNRs) stacked orthogonally to each other with a vertical separation of 3.35Å in-between Lake 2011, 2012; Habib et al. 2013] . Each of these AGNRs has a truncated end with a zigzag edge. The overlap region of the xGNR is a misoriented or twisted bi-layer graphene. Since we are interested in current switching in absence of a bandgap, the GNRs are chosen to be 14-C atomic layers [(3n + 2) ∼1.8nm] wide to minimize the bandgap resulting from the finite width. A voltage bias is applied to the top GNR with respect to the bottom one. Assuming the majority of the potential drop occurs in-between the two nanoribbons, the potential difference between the GNRs is equal to the applied bias.
The current-voltage (I-V) response of the xGNR is calculated using first-principles atomistic calculations. The simulated I-V characteristics (Figure 2(b) ) exhibit negative differential resistance (NDR) with multiple peak and valley currents. This makes it suitable for RTD-based applications [Mazumder et al. 1998 ]. The oscillatory currentvoltage response is attributed to the quantum interference between the standing electronic waves inside the twisted bi-layer region of the xGNR, as explained shortly. An electron in a semi-infinite AGNR behaves as standing wave due to the reflection occurring at the truncated end. The wavelength of such a standing wave is a function of the total energy of the electron. Thus, by creating a potential energy difference between the top and bottom layers of xGNR, one can control the phase difference between the standing waves of individual nanoribbons. These standing waves interfere inside the overlap region of the xGNR. Depending on the phase difference (and hence the potential energy difference), the interference can be either constructive or destructive. Constructive interference occurs when the potential difference is V = (2m + 1)
, where m ∈ {0, 1, 2, . . .}, v is the speed of the electron in graphene, L is the length of the truncated end of AGNR, is the reduced Plank's constant, and q is the charge of an electron. Similarly, the standing waves interfere destructively when V = n π υ qL , where n ∈ {0, 1, 2, . . .} [Habib et al. 2013] .
The interlayer tunneling current becomes maximum (or minimum) when the interference is constructive (or destructive). Thus, an external voltage bias applied across the layers of xGNR results in multiple constructive and destructive interferences, which leads to oscillatory current-voltage response with multiple NDR regions.
Application of xGNR Device in a Multistate Memory Element
A memory element can be built leveraging the NDR characteristics by using two xGNRs in a series configuration (Figure 3(a) ), similar to a Goto pair [Goto et al. 1960] . The circuit schematic of this configuration is shown in Figure 3 (b). The xGNR latch consists of a pull-up leg and a pull-down leg. One of the devices (xGNR1) is connected to supply voltage (V dd ) and acts as the pull-up device. The other device (xGNR2) is connected to ground terminal acting as the pull-down device. The common terminal between these devices is the state node (SN). Data is encoded in the voltage at this state node. DC load-line analysis of this configuration exhibits three stable states A, B, and C under applied voltage bias, as shown in Figure 3 (c). Thus it can be used as a binary latch or ternary latch, depending on the choice of data representation (see Table I ).
The latching mechanism is illustrated in Figure 4 [ Khasanvis et al. 2011 ]. The following terms will be used in the discussion: I p1 , V p1 -first peak current and corresponding voltage; I p2 , V p2 -second peak current and corresponding voltage; I v1 , V v1 -first valley current and corresponding voltage; I v2 , V v2 -second valley current and corresponding voltage. as ternary memory. The y-axis represents currents and x-axis represents voltage at the state node (V SN ). The solid line shows pull-down current and the dashed line represents pull-up current. Assuming the state node is initially at 0V, when the voltage V dd is gradually increased, the operating point (shown by the dot "X" in Figure 4 (a) is given by the intersection between pull-up and pull-down currents (satisfying Kirchoff 's Current Law). Figure 4 (a) shows the situation when the first pull-down current peak is encountered, which is a decision point. As long as the pull-up current (I in + I xGNR1 ) is greater than pull-down current (I xGNR2 ), the state node continues to shift from operating point X (Figure 4(a) ) to point Y (Figure 4(b) ). Finally, it shifts to point C (Figure 4 (c)) when V dd reaches its maximum value. When the input current (I in ) is switched off, the state node is latched to state C. Hence, to be able to latch state C, the following condition should be met.
(1)
shows the process of latching logic LOW onto the state node (state A in Figure 3 (c)). This represents logic 0 in both binary and ternary representation. Consider the state node is initially at 0V and the input is logic low. In this case, input pull-down current (I ex ) is applied at the state node. The analysis proceeds on the same lines as before. As long as the net pull-up current (I xGNR1 ) is lower than pull-down currents (I ex + I xGNR2 ), the state node voltage (V SN ) will not rise beyond V p1 (Figure 4 (e)-(f)). After input I ex is switched off, the state node remains at state A. Thus, to be able to latch logic 0, the following condition has to be satisfied.
Similarly, when used as a ternary latch, the state node can be latched to the stable point B (in Figure 3) if the following condition is satisfied.
The retention of data in the latch is discussed next. As mentioned earlier, the states A, B, and C (in Figure 3 (c)) are stable. When the state node is at one of these stable points, any external noise that causes the state voltage to increase or decrease would be countered by strong restoring currents (see Figure 3 (c)). For example, when the state node is at voltage corresponding to state C, a constant static current flows through the devices. Any external perturbation (noise) results in a noise current (I noise ) that may cause the state node voltage to decrease (or increase). This is countered by a net current that pulls up (or pulls down) the state node. The magnitude of the restoring current is given by the difference between the pull-up and pull-down currents (|I xGNR1 − I xGNR2 |). As long as the noise current is smaller than this restoring current, the operating point does not move beyond the decision points and data is retained.
States denoted by P and Q (in Figure 3(c) ) are unstable and hence the corresponding voltages are transition voltages. Consider state Q; due to lack of a restoring current, external noise would cause the state node voltage to transition to one of the surrounding states, depending on the direction of the perturbation. Thus, for correct latch operation, the noise currents should be less than the restoring currents to ensure that states P and Q are not reached during latch retention.
Our previous work explored binary random access memory circuits using an xGNR latch as the memory core, as well as access transistors for writing and reading data [Khasanvis et al. 2011 ]. This could also be used as a ternary memory cell [Khasanvis et al. 2012] . However, these circuits still required physical down-sizing of transistors to scale further. In the following section, we present an approach for scaling that is alternative to physical scaling, where the number of bits in a single cell can be further increased. This can potentially overcome the limitations of down-sizing CMOS transistors, providing an alternative pathway for scaling.
PROPOSED SCALING APPROACH
The key requirement for scaling is to increase the number of stable states of the xGNR latch, which would allow storing more bits in a single cell. This can be achieved by increasing the number of current peaks in the pull-up and pull-down legs of the xGNR latch. When multiple xGNR devices are used in each leg, the I-V characteristics of such a configuration will exhibit more current peaks than if a single device is used in each leg [Kao et al. 1992] .
As shown in Figure 5 (a)-(b), a series combination of two xGNRs leads to four current peaks. Similarly, three xGNRs in series leads to six current peaks ( Figure 5(c)-(d) ). In general, a series configuration of "N" xGNR devices exhibits "2N" current peaks, since each xGNR device has two current peaks. However, every additional xGNR in the stack would require a higher operating voltage in order to reach all the current peaks. Thus, the operating voltage limitation determines the maximum number of current peaks (and hence the number of stable states) that can be achieved with such a multipeak xGNR circuit.
For the xGNR latch shown in the preceding section, each device in pull-up and pulldown legs exhibited two current peaks in their I-V characteristics, which led to three stable states. In general, a latch configuration with devices having "P" current peaks in each leg would exhibit "P + 1" stable states. Thus, a configuration of two series xGNRs in each leg of the xGNR latch (Figure 6(a)-(b) ) would lead to five stable states at the state node, since both pull-up and pull-down legs have four current peaks. We use four of these states (as shown in Figure 6 (c)) to build a quaternary memory cell, as discussed next. 
QUATERNARY GRAPHENE NANORIBBON TUNNELING RANDOM ACCESS MEMORY
An xGNR latch configuration with two series xGNR devices in each leg can realize a quaternary latch, and this is used to build quaternary graphene nanoribbon tunneling random access memory (GNTRAM). Such a design will enable storing 2 bits in a single memory cell, resulting in a higher memory density than CMOS SRAMs that store 1 bit per cell.
A dynamic memory cell implementation is adopted for low-leakage, low-power quaternary GNTRAM, as shown in Figure 7 (a). This design uses the quaternary xGNR latch as the state holding element, thus exhibiting four stable states (see Figure 7(b) ). Access to the state node is achieved with write-FET and read-FETs. To mitigate static power dissipation, the xGNR latch is switched OFF during idle periods using a sleep-FET and a Schottky diode. A capacitor (C SN ) is then required at the state node to retain the state written into the cell. The Schottky diode provides current rectification, mitigating charge leakage through reverse current paths when voltage is switched OFF during idle periods. This implementation is a multithreshold circuit design, where transistors with high threshold voltage (V t ) are used in leakage-critical paths, and low-V t transistors are used in other paths. The operation of quaternary GNTRAM is described next.
Write Operation
The write operation involves charging-up/discharging the state capacitance to the required voltage through the write-FET. The gate terminal of the write-FET is connected to the write-line and the drain terminal is connected to the input data-line, with the source terminal acting as the state node. During a write operation, the memory cell is selected by activating the corresponding write-line, and then the restore signal is switched ON. Data is written by applying the required input voltage onto the dataline, which either charges or discharges the state capacitance depending on the previous state. Here, the input voltages used are in quaternary representation (0V -logic 0, 0.75V -logic 1, 1.1V -logic 2, and 1.5V -logic 3). These voltage values are chosen based on voltages at which stable states A, B, C, and D occur in the xGNR latch characteristics, respectively (see Figure 7(b) ). After the data has been written, the input and write signals are switched OFF while the restore signal is still ON. This results in restoring currents through the xGNR latch that prevent FET switching noise transients from affecting the state node. Once the write-FET is completely switched OFF, the restore signal can be de-asserted and the data is held on the state capacitor. Figure 8(a) shows the simulated write operation (using HSPICE) for all possible state transitions in the quaternary GNTRAM cell.
Read Operation
A predischarge-and-evaluate scheme is used to read the stored information in the memory cell. The series stack of read-FETs acts as the evaluation path during read operation (see Figure 7(a) ). The output data-line is connected to the source of read-FET2. The state node is used to gate read-FET1 and hence isolated from the output data-line. This scheme ensures that the read operation is nondestructive.
To initiate the read operation, the data-line is discharged first to 0V and then the read-line signal is switched ON (see Figure 8(b)-(d) ). This starts to pull up the voltage on the output data-line. The voltage to which the output can be pulled up is limited by the state node voltage at the gate of read-FET1. This is due to the intrinsic threshold voltage drop in the nMOS transistor. Thus the final output voltage at the end of read operation is specific to the stored state, which enables the detection of multiple voltage levels at the data output. When logic 0 is stored, read-FET1 is completely switched OFF and the data-line remains at low voltage. For all other stored logic states, read-FET1 is switched ON and the output is pulled up to the corresponding voltage level (see Figure 8(b)-(d) ). To successfully distinguish between different stored states, low-V t transistors are used in the read path.
Restore Operation
During idle periods, the xGNR latch is switched OFF by turning OFF the restore signal. Data is then stored on the state capacitance. However, the stored charge on the state capacitance starts to leak and needs to be replenished. This is done by simply switching ON the restore signal periodically within a stipulated time interval. For example, consider logic 3 (state D in Figure 7 (b)) being stored in the memory. During idle periods, the stored voltage gradually reduces due to leakage. When the restore signal is switched ON, a net pull-up current in the xGNR latch charges the capacitor back to logic 3. As long as the voltage has not dropped below the transistion state between logic 2 and logic 3 (see Figure 7 (b)), it can be restored.
The time for which a written state can be maintained before it has to be restored is called retention time, and it is desirable to maximize the retention time. Two factors contribute to this: (i) total capacitance at the state node, and (ii) total leakage current.
The value of the state node capacitance (C SN ) is determined by: (i) the total value of the parasitic capacitances of the diode and the sleep FET, and (ii) the worstcase voltage margin. Due to the parasitic capacitances, the charge written onto the state node is immediately redistributed after the write operation (when the write and restore signals are deactivated), and the cell goes into idle mode. This degrades the written voltage value (V W ) to a quiescent voltage level (V Q ). For example, consider the case when logic 3 was written into the memory cell, shown in Figure 9 . If V Q falls below transition voltage (V tran in Figure 9 ) after the write operation, the subsequent restore operation will cause a state transition to logic 2 instead of restoring logic 3 at the state node. Thus the value of V Q should be high enough to ensure that the state information is not lost immediately after write operation. In addition, the quiescent voltage level (V Q ) should also ensure that sufficient voltage margin (VM in Figure 9 ) is maintained for dynamic data retention. By choosing an appropriate V Q , the retention time can be optimized. Based on these requirements, the minimum value of the total capacitance at the state node for a particular V Q can be derived using the following relation.
In (5), C SN is the total capacitance at the state node. This includes the explicit capacitance to be formed at the state node, diffusion capacitance of the write-FET, gate capacitance of read-FET1, and the capacitance due to routing lines. C PT is the total parasitic capacitance, which includes the diffusion capacitance of the sleep-FET and the capacitance of the Schottky diode. V W is the voltage to which the state node is charged during write operation. The available voltage margin for retention is given by the difference between V Q and V tran .
A higher state capacitance leads to a higher voltage margin and thus lengthens the retention time. However, a large state capacitance is not desirable as it slows down the write operation. The other option is to reduce the magnitude of leakage currents. To minimize charge leakage, the critical leakage paths need to be identified. In this design, the write-FET and sleep-FET form leakage-critical paths since the transistors are directly connected to the state node. Hence, they are implemented with high-threshold voltage (V t ) transistors which are typically optimized to have very low OFF-state current and minimize leakage during standby.
However, even with the use of high-V t transistors, the retention time for the quaternary GNTRAM was found to be low (in the order of a few nanoseconds). This is due to exacerbated leakage at the relatively higher operating voltage when storing logic 3. This necessitates leakage mitigation techniques to improve the retention time, which is discussed next.
LEAKAGE ANALYSIS AND MITIGATION
Leakage current in MOS transistors is exacerbated at high voltages. Hence, during idle periods, leakage currents are the highest when the memory cell stores logic state 3 (1.38V). Analysis of the leakage paths (denoted by LP1 through LP4 in Figure 10 (a)) shows that the write-FET and sleep-FET form critical paths (LP1 and LP2), since they are connected through low-impedance paths to the state node. For both devices, the sources of leakage are: gate tunneling current (I 1 ), reverse-bias junction leakage (I 2 ), and subthreshold channel leakage (I 3 ). It was found that for the 16nm LP PTM devices used, leakage current was dominated by subthreshold channel leakage (I 3 ).
One of the frequently used circuit techniques in literature to reduce the OFF-state subthreshold channel leakage is source/gate biasing [Itoh 2007 ]. This scheme is most effective compared to other techniques such as body biasing or V DS reduction [Itoh 2007 ]. The subthreshold analysis of the devices (see Figure 10 (b)) shows that, when the source is offset by 0.1V during idle periods, the leakage current can be reduced by almost 10× when storing logic state 3. Thus the data-line and the source terminal of the sleep-FET are maintained at 0.1V during idle mode. This can be achieved either by using a self-biasing scheme with a shared carefully sized nMOS transistor in series [Itoh 2007] or by selecting a separate voltage source [Elakkumanan et al. 2003 ].
The remaining leakage sources are the gate-oxide tunneling current through read-FET1 (LP3 in Figure 10(a) ) and the reverse-bias leakage of the Schottky diode (LP4 in Figure 10(a) ). The gate-oxide leakage can be reduced by increasing the oxide thickness (for the 16nm PTM transistor used here, V th0 was recalculated using the equation for retrograde doping CMOS [Morshed et al. 2011] ). Thus, the read-FET1 will need to be engineered to minimize the gate-oxide tunneling current while still maintaining low-enough V t to be able to read the stored states. The reverse-bias leakage through the Schottky diode is assumed constant at 10pA. These techniques enhanced the data retention period to 0.5μs, as shown in Figure 10 (a).
PHYSICAL IMPLEMENTATION
A cross-technology heterogeneous implementation is used between CMOS and graphene [Khasanvis et al. 2011] , as shown in Figure 11 . A lithography-friendly gridbased layout is used with minimum-sized nMOS transistors for high density and ease of fabrication (Figure 11(a) ). The MOS transistors are created first on the Si substrate. The xGNR devices are implemented in a graphene layer on top of the MOS layer. Interfacing between these layers is done with the help of metal vias. GNRs can form either Ohmic or Schottky contacts with metals, depending on whether they are metallic or semiconducting [Mao et al. 2010; Guan et al. 2008 ]. This feature is leveraged to realize the Schottky diode with the help of a Schottky contact between a narrow semiconducting armchair GNR and metal, as shown in Figure 11(b) . The rest of the graphene-metal contacts are Ohmic to ensure proper operation, achieved by using wide GNRs [Unluer et al. 2011] . Both Schottky diode and sleep-FET receive the same restore signal. Hence the layout is arranged so that the restore signal reaches both devices almost simultaneously. The data line is multiplexed between read and write operations, since only one of these operations is performed on a memory cell at a given time. Interconnections are implemented with conventional CMOS routing layers (Figure 11(c) ). The state capacitor can be implemented either as a trench or as a stacked capacitor over the state node routing area shown in Figure 11 (a).
METHODOLOGY AND BENCHMARKING
An HSPICE circuit simulator was used to verify the GNTRAM operation and for power and performance analysis. The xGNR device was modeled as a HSPICE subcircuit [van der Wagt 1999] using the structure shown in Figure 12 . The DC I-V characteristics derived from the atomistic simulations (mentioned in Section 2.1) were modeled using a voltage-controlled current source (VCCS) with a piecewise linear approximation between each I-V data point. The geometric capacitance at the GNR crossbar was modeled as a capacitor in parallel to take reactive currents into account in addition to DC response. A generic integrated circuit Schottky diode model was used for a first-order analysis and 16nm CMOS PTM models [Predictive Technology Models 2015] were used to simulate the read-, write-, and sleep-FETs. The value of the state capacitance was chosen to be 200aF for required circuit behavior. PTM interconnect RC models based on scaled interconnect dimensions were used in conjunction with the PTM transistor models for power and performance evaluation of GNTRAM using HSPICE. For physical layout design and area evaluation of GNTRAM, 1D gridded design rules [Bencher et al. 2009] were used, as shown in Table II. For benchmarking against CMOS, a 16nm gridded 8T SRAM cell [Greenway et al. 2008] was used, since this SRAM design utilizes the same grid-based design used in GNTRAM. A regular 6T CMOS SRAM scaled to 16nm technology node was also used for benchmarking. Area scaling was done based on a wide range of design rules published by the industry for both high-performance and low-power 6T SRAM designs at 65nm, [Bencher et al. 2009] M1, M2 Interconnect Poly Pitch (16nm node) 40∼60 nm 60∼80nm 45nm, and 32nm technology nodes. This method is detailed in . Using this data, scaling factors were derived based on cell area, poly, metal1, metal2, and via scaling trends. These were used to calculate 16nm 6T SRAM design rules and cell area. The aforementioned 16nm predictive technology models (PTM) transistors and RC interconnect models were used for power and performance evaluation of CMOS 6T SRAM and gridded 8T SRAM using HSPICE. Both low-power and high-performance 6T and 8T SRAM cell designs were considered for comparison since quaternary GNTRAM uses a multi-V t cell design. 3T DRAM was also investigated for benchmarking since it is a potential candidate for on-chip caches in advanced technology nodes [Itoh 2007; Chun et al. 2011] . The 3T DRAM cell (shown in Figure 13 ) was designed using 16nm PTM transistor models, and the physical layout was done along the same lines as the GNTRAM. It was simulated using HSPICE for power and performance evaluation. Area evaluation was done using grid-based design rules (see Table II ). Table III shows the comparison results. 
Area Evaluation
The GNTRAM physical cell area was estimated for the layout in Figure 11 (a) based on the design rules shown in Table II . Since this is a grid-based design, the area was calculated by counting the number of metal and poly pitches along each dimension. This area accounts for spacing required between adjacent GNTRAM cells as well.
Quaternary GNTRAM showed significant density advantage compared to the other 16nm CMOS RAMs. Although the physical cell area is comparable to those of the SRAMs and the 3T DRAM, quaternary GNTRAM's density benefit comes from the fact that it stores 2 bits per cell. In particular, GNTRAM showed a density-per-bit benefit of up to 2.27× versus CMOS SRAM and 1.8× versus the 3T DRAM in 16nm technology node.
Considering the current SRAM scaling trend, CMOS SRAM-when advanced by two technology generations after 16nm node-would have about the same area as 16nm quaternary GNTRAM. Thus GNTRAM provides an alternative to physical scaling.
As graphene technology matures, the availability of graphene transistors would enable a monolithic graphene fabric with potentially ultra-dense nanoscale multistate memories.
Power Evaluation
For power evaluation, GNTRAM power dissipation was measured using HSPICE simulations during both active (read/write) and idle periods. For active power, the power dissipation was measured for all possible state transistions during write operation, as well as during the read operation. The worst-case power was then considered and is reported here. The same method was followed for evaluation of CMOS SRAM and 3T DRAM cells. Quaternary GNTRAM showed up to 1.32× lower active power per bit against CMOS high-power SRAM designs. It also showed up to 1.17× lower active power per bit against the 3T DRAM in 16nm node.
Standby power dissipation was measured with HSPICE simulations during idle periods (no switching activity), when GNTRAM is storing data on the state capacitance. Quaternary GNTRAM was 818x more power efficient during idle period against the high-performance CMOS SRAM, and 6.53x more power efficient against low-power CMOS SRAM in 16nm node. These benefits arise for two reasons: (i) GNTRAM is dynamic and hence no static paths exist to contribute to idle power; and (ii) GNTRAM stores 2 bits per cell, thus amortizing leakage costs. The 3T DRAM exhibited lower standby power than GNTRAM since it has lesser number of leakage paths.
Performance Evaluation
GNTRAM performance was evaluated by measuring the time taken to write data onto the state node using HSPICE simulations. All state transitions during write operation were measured and the worst-case write time is reported here. Similarly, time taken to read various stored states was measured and worst-case read time was considered. CMOS SRAM and 3T DRAM performance measurements were performed using the same method.
Quaternary GNTRAM was comparable in read performance to high-performance CMOS SRAMs, even though it uses a higher capacitance at state node. This is because GNTRAM uses low-V t transistors in its read path which are typically optimized to have high ON current. The asymmetric cell design (multi-V t transistors) thus enables high performance while reaping those benefits due to low-power operation. An asymmetric approach was necessary in GNTRAM because the read-FETs need to have a low V t to successfully differentiate between the stored states. The write performance of GNTRAM was slower because of the increased voltage swing associated with storing logic 3, which requires a longer time to charge the state capacitance. The 3T DRAM performed better than GNTRAM during write operation because the state node capacitance is lower in 3T DRAM (which is the just the gate capacitance of a read-FET).
CONCLUSION
A low-power multistate memory concept was introduced in this article enabled by unique graphene nanoribbon crossbar devices (xGNRs). This presented a new direction for scaling where the number of bits stored in a single cell can be increased as an alternative to physical down-sizing of transistors. This may potentially overcome the challenges associated with transistor scaling. A quaternary graphene nanoribbon crossbar tunneling random access memory (GNTRAM) cell was presented as a baseline and implemented with a heterogeneous integration between CMOS and graphene. Benchmarking against state-of-the-art 16nm CMOS RAM designs showed that quaternary GNTRAM exhibited significant benefits which stem from storing 2 bits per cell.
This work takes the initial step towards exploring the potential of multistate memories for on-chip memory applications enabled by graphene. While operating voltage may limit the maximum number of bits that can be stored, the xGNR device itself can possibly be engineered to have more current peaks within a smaller operating voltage. As progress is made in graphene technology, further benefits may be expected by replacing Si MOSFETs with graphene transistors, thus resulting in ultra-dense nanoscale memories.
