Abstract-We have demonstrated a subthreshold FPGA system using monolithically integrated graphene wires. The graphene wires replace double-length lines in the interconnect fabric of a custom FPGA implemented in 0.18-μm CMOS. The four-layer graphene wires have lower capacitance than the CMOS aluminum wires, resulting in up to 2.11× faster speeds and 1.54× lower interconnect energy when driven by a low-swing voltage of 0.4 V. This paper presents the first graphene-based system application and experimentally demonstrates the potential of using lowcapacitance graphene wires for ultralow power electronics.
material properties are expected to become more important in highly scaled technologies.
These topics are addressed by demonstrating a fieldprogrammable gate array (FPGA) using monolithically integrated four-layer graphene wires as part of the interconnect fabric. This paper presents the first experimental demonstration of the following: 1) a complete system application using multiple graphene devices and 2) graphene interconnects being used for subthreshold circuits. An FPGA has a highly interconnectcentric architecture making it an ideal test vehicle for graphene integration. The interconnect fabric is often the main bottleneck when designing low-power FPGAs [17] - [19] . This paper is organized as follows. Section II discusses the impact of graphene wires in subthreshold circuits. Section III explains the graphene integration process and chip architecture. Section IV discusses the measured system results. Finally, Section V concludes this paper.
II. GRAPHENE INTERCONNECTS FOR SUBTHRESHOLD CIRCUITS
In recent years, many circuit designs have achieved very high energy efficiencies by operating in the subthreshold regime [20] - [23] , where the supply voltage is smaller than the transistor threshold voltage. Decreasing the supply voltage has a large impact on the energy efficiency since the system energy is proportional to C eff V 2 , where C eff is the effective switching capacitance and V is the supply voltage. Subthreshold circuits are inherently slow due to the decreased current levels and can tolerate more resistive wires. The use of low-capacitance wires can improve both the delay and energy in such systems.
When a CMOS inverter is driving a distributed RC wire ( Fig. 1) , the RC time constant is approximated as follows [24] :
where R on is the equivalent on resistance of the transistor and R w and C w are the distributed resistance and capacitance, [26] . The wire pitch is 30 nm, and the length is 1 mm. Cu wire assumes an aspect ratio of 2. Four-layer graphene wire assumes a reasonable sheet resistance of 100 Ω/sq.
respectively. At the nominal supply voltage, R on is typically a few kΩ, and the wire resistance dominates. However, as the voltage is scaled below the threshold voltage of the transistors, R on increases exponentially and becomes the dominant resistance term as long as the wire resistance does not degrade significantly. Fig. 2 shows the performance of a long graphene and Cu wire. Rather than optimizing the Cu thickness, a nominally sized (thick) Cu wire with an aspect ratio of 2 is used in this paper. The purpose of this analysis is to understand the impact of using few-layer graphene instead of the existing metal stack for subthreshold applications. The significance of this is explained below. Both wires are sized for an intermediate wire at the 16-nm node and simulated using parameters and circuit models found in [25] , [26] . A simple buffer drives the signal on these wires, which is modeled using a π3 distributed RC model.
For long interconnects, the energy (1/2C w V 2 ) is largely determined by the capacitance of the wire and favors graphene. However, the amount of energy savings diminishes at high supply voltages as short circuit current increases for the slow graphene wires. At high supply voltages, the high resistance of the graphene wire negatively affects the RC delay. However, at low supply voltages, the large resistance of the transistors begins to dominate and the wire capacitance becomes important, resulting in a 2× lower delay for the graphene wire. Although we demonstrate this at the 16-nm node, the qualitative results are true regardless of the underlying CMOS technology because the transistor resistance begins to increase exponentially in subthreshold.
Jamal and Naeemi have previously suggested changing the aspect ratio of the Cu wires or using carbon-based devices for subthreshold circuits [8] . Since energy is determined by the wire capacitance, Cu wires can also achieve similar energy savings only if their thickness is comparable to that of fewlayer graphene. However, this is not only impractical but also the resulting Cu wire resistance will be significantly degraded.
Nonetheless, resizing the Cu wire to a more realistic aspect ratio may be a viable alternative. Fig. 3 shows the performance of a thin Cu wire. We use similar parameters as above except that the Cu wire has a smaller aspect ratio of 0.5. This results in [26] . The wire pitch is 30 nm and the length is 1 mm. The capacitance values are obtained using a commercial field solver. up to 1.5× lower delay and 1.7× lower energy than a nominally sized Cu wire. In the subthreshold operation, the performance difference between the thin (AR = 0.5) Cu wire and four-layer graphene is roughly 30%. This is a result of the reduced Cu capacitance which is shown in Fig. 4 .
Nevertheless, to maximize the amount of energy savings, few-layer graphene is preferred over thin Cu wires. Graphene is an ideal candidate for low-capacitance wires because it is intrinsically very thin. Graphene sheets are one atom thick, and the ability to chemically synthesize single crystalline sheets with atomic precision presents a significant manufacturing advantage over bulk materials. The back-end process flow in a CMOS technology typically includes physical deposition of metallic sheets, which results in a polycrystalline film. At extreme thicknesses of a few nanometers, Cu wires are more resistive than graphene wires. Graphene is more reliable and robust and thus a better candidate for extremely thin lowcapacitance wires.
This paper explores the use of graphene as low-capacitance wires. In Section IV, we experimentally demonstrate that fewlayer graphene wires outperform nominally sized metal wires in the subthreshold operation. Using several layers of graphene reduces resistance and is also more stable during the integration process. However, some groups have suggested that the transition from graphene to graphite occurs around seven to eight layers of graphene [27] , [28] . In graphite, the increased intersheet electron interactions lower the conductivity per (graphene) layer [29] . In addition, obtaining high-quality thick multilayer graphene sheets that are suitable for large-scale manufacturing is very difficult. More importantly, Fig. 4 indicates that roughly ten or fewer layers are desirable for graphene to maintain a smaller capacitance than Cu. Throughout this paper, we use four layers of graphene to balance wire performance and ease of integration.
III. SYSTEM DESIGN

A. Graphene/CMOS Integration
An FPGA was implemented in a 0.18-μm CMOS process. The underlying circuit was designed with missing wires, onto which graphene wires were subsequently integrated. The process flow is shown in Fig. 5 . First, via holes are formed by etching through the CMOS passivation layer and exposing the top metal layer (aluminum). After Ti/Au is deposited to fill the via holes, the graphene sheets are transferred onto the CMOS chip.
To fabricate the graphene sheets, we use a separate Cu foil in a low-pressure chemical vapor deposition (CVD) process to first grow the monolayer graphene sheets [30] . The growth is carried out at 1000
• C and 10 mT, with 20-and 10-sccm flow of CH 4 and H 2 , respectively. After the CVD process, poly(methyl methacrylate) (PMMA) is spin-coated on the graphene/Cu film. The PMMA/graphene film is released as the underlying Cu film is then etched away in Cu etchant and diluted HCl. The PMMA/graphene film is placed on the CMOS test chip, and the PMMA layer is removed by acetone vapor. This transfer process is repeated four times to achieve a thicker graphene stack and lower sheet resistivity. For clarity, the transfer process is also shown in Fig. 5 . The resulting sheet resistance from the four-layer stack is roughly ∼270 Ω/sq.
After all the four layers have been transferred, the graphene sheet is patterned into wires using electron-beam lithography and an Ar/O 2 plasma etch step. Finally, Ti/Au contacts are deposited on top of the wires/vias. A similar integration approach was demonstrated in [13] and [14] . Fig. 6 shows images of the test chip and the integrated graphene wires. Roughly 42% of the integrated graphene wires worked. Visual inspection of the chip shows that the device yield is largely impacted by the transfer process. Parts of the graphene sheet get wrinkled or torn apart, resulting in either holes or nonuniform multilayer regions. Achieving nearly 100% coverage is absolutely critical for interconnect applications. Unlike recent demonstrations of using thin graphene sheets as touch-screen panels [31] , integrated circuit applications have much more stringent manufacturing requirements. Improving or eliminating the transfer process remains as future work.
B. Chip Implementation
FPGAs are a common form of reconfigurable logic that is often used for rapid prototyping or in systems that require flexibility. With the rising cost of designing custom circuits, using a flexible and energy-efficient FPGA has become a more attractive solution in many IC systems. Because of the extra routing structure, FPGAs are typically larger and slower, and consumes more power than a dedicated hardware unit [32] . Various approaches have been taken to reduce this overhead. In [15] , a nanoelectromechanical relay replaced the NMOS pass transistors in a switch matrix. More conventional approaches have tried to optimize the global interconnect architecture since their performance dominates the delay and energy of the FPGA [17] - [19] . Our test chip implements an FPGA with a 5 × 5 configurable logic array (Fig. 7) . Each logic block includes a cluster of two four-input LUTs and two flip-flops, totaling to 50 LUTs in the chip. The content of the LUTs can be programmed to implement an arbitrary logic function. A 10-bit unidirectional bus runs throughout the logic array, and the programmable switch matrices (SW) provide connection points at each intersection. Each switch matrix implements a Wilton switch box [33] which has better routability than the conventional switch matrix using NMOS pass transistors.
Many advanced FPGAs commonly include variable length lines which span across varying number of switch matrices [34] . Due to limited resources, only single lines and double (L = 2) lines are implemented in this paper. A total of 16 wire segments exist for L = 2 wires, which connect every other switch matrix. Fig. 8 shows a diagram of the interface between the graphene wires and the switch matrices. At each L = 2 segment, in addition to a reference (M5) wire, three redundant graphene wires are added for increased reliability. Only one wire is active at any given time. A low-swing topology is used for energy reduction. Several techniques have been explored to design a suitable subthreshold level converter with a large dynamic range and low delay [35] - [37] . A level shifter using diode-connected PMOS devices is implemented at the receiver [37] . This topology has low delay and large operating range and In addition, a local tester is added at each wire segment to monitor the quality of each graphene wire. Graphene is not a mature technology yet, and the integration process results in variations in the performance of graphene wires. The tester uses a low-overhead 3-bit time-to-digital converter to measure the delay of the graphene data link (between A and B in Fig. 8 ). This measure of reliability is necessary for the test chip to function properly. The measured delay of each L = 2 wire indicates not only how fast the data link is but also whether graphene is connected or not.
Most of the FPGA core blocks were synthesized using a standard cell library. To mitigate the increased sensitivity to variations in subthreshold operation, strict design rules were applied [18] , [20] - [23] , [38] . The synthesis was optimized for subthreshold operation by the following: 1) limiting the fanin to three; 2) using static CMOS logic only; 3) using logic gates with short stacks of less than four stacked transistors; 4) remapping large multiplexers in to 2-1 multiplexers; and 5) upsizing critical clock and data buffers to ensure rail-to-rail outputs.
IV. SYSTEM TESTING
A. Measured Results
FPGAs serve as design platforms rather than as isolated systems. The FPGA test chip needs to be configured to perform a specific task, for example, an 8-bit multiplier. The bitstream contains all the configuration data for programming the LUTs, switch matrices, and L = 2 wires. Before the FPGA is configured to run a benchmark, the test chip is programmed to assess the quality (delay) of each L = 2 wire. A commercial Xilinx FPGA was connected to the test chip (Fig. 9 ) and serves as a master device to perform the following: 1) scan the tester results; 2) load the bitstream; and 3) apply the appropriate benchmark inputs to the test chip. Fig. 10 shows a screen capture of running two different benchmark applications.
Before the FPGA is configured, we run the graphene testers and retrieve the delay results for all L = 2 wires as shown in Fig. 11 . The material nonuniformity contributes to this large variation in device performance. By using the testers, we can assess this variation and carefully select each L = 2 wire to improve the overall system reliability.
After each graphene wire is characterized, the test chip can be programmed to route the signals on one of the four wires for each L = 2 segment. Two sets of bitstreams are generated that activate either the graphene or M5 wires. Due to limited device yield, M5 wires are enabled if a particular segment did not have any working graphene devices.
Figs. 12 and 13 show the maximum operating frequency and (L = 2 wire) energy, respectively, while running a representative benchmark (three-stage pipelined multiplier). Rather than plotting the total system energy, we only plot the energy dissipated from the secondary supply V REF which is the only source that directly affects the graphene or M5 wire. Furthermore, the total system energy is dominated by all other logic blocks; thus, comparing the L = 2 wire energy gives us a more accurate picture of the performance difference between graphene and M5 wires. The delay of the L = 2 wire, however, affects the critical path delay and has a direct impact on the frequency of the system. is limited by the level converters. When V REF = 0.4 V, the system performs up to 2.11× faster and consumes 1.54× less (L = 2 wire) energy when using the graphene wires over the M5 wires. The optimal operating point is around V REF = 0.4 V and V DD = 0.5-0.6 V. In this example, only half of the L = 2 segments had working graphene devices. If all L = 2 segments had working graphene wires, the estimated energy reduction is approximately 3×, which roughly corresponds to the difference in the wire capacitance.
As described in Section II, thin Cu wires are difficult to fabricate and result in lower performance than graphene wires. This paper compares graphene and M5 wires to understand the impact of using graphene instead of nominal (thick) metal wires for subthreshold applications. Few-layer graphene wires are extremely thin and thus have low capacitance but very high resistances compared to similarly spaced metal wires. The estimated capacitances of the graphene and M5 wires are 18.7 and 52.5 fF, respectively. One limitation of this work is that graphene is placed on top of the passivation layer. Nonetheless, if graphene is placed on the same plane as the M5 wire, the estimated graphene capacitance increases but is still roughly 2× smaller than that of the M5 wire. Assuming a sheet resistance of 270 Ω/sq for our four-layer graphene wires, the resistance of graphene wires is approximately three orders of magnitude higher than that of M5 wires. As discussed earlier, when graphene wires are driven by a lower voltage swing, the delay and energy can be improved because the wire performance is dominated by the large resistances of the transistors and the wire capacitance [8] . This experimentally demonstrates the potential of using few-layer graphene wires to increase the performance in subthreshold circuits. 
B. Limitations and Discussions
A summary of the measured results is shown in Table I . In subthreshold operation, graphene wires have many advantages over metal wires. However, this work serves as a technology demonstrator, and the overall impact of using graphene is somewhat limited in this paper. Fig. 14(a) shows the total energy of the system. The test chip operates down to V DD = 0.3 V, which is well below the threshold voltage. The system achieves minimum energy operation of 8.7 pJ/cycle around V DD = 0.45 V. When V DD < 0.45 V, the leakage energy begins to dominate and increases the total energy. When the system is actively running a benchmark, Fig. 14(b) shows that slightly more than half of the power is used for interconnects, including the clock network, single lines, and double lines. The L = 2 wires only comprise a small portion (< 3%) of the total energy. Therefore, although the graphene wires may have lower energy, the overall impact of that on the total energy is limited in this work.
The amount of energy improvement is expected to increase as larger systems are designed that use longer graphene wires and use such wires more extensively. The physical length of the L = 2 wires was limited to 300 μm due to resource limitations. Fig. 15 shows the estimated delay and energy of a minimum pitch global wire as a function of wire length. For short wires, graphene does not have any advantage over the nominal metal wires since performance is dominated by the parasitics of the transistors. Depending on the sheet resistance of graphene, the lowest delay is obtained when the graphene wire length is between 300 μm and 3 mm. At longer lengths, or when the sheet resistance is too high, the wire resistance becomes larger than the transistor resistance and increases the graphene wire delay. On the other hand, the sheet resistance has very little impact on the wire energy, and larger energy savings are achieved for graphene wires at longer lengths. Finding an optimal length is important in maximizing the performance. In this paper, despite the relatively high sheet resistance of ∼270 Ω/sq for fourlayer graphene, we have demonstrated both delay and energy improvements over similarly sized metal wires in subthreshold operation. Sheet resistances as low as 30 Ω/sq from four layers have been reported [31] , suggesting the possible use of graphene as long global wires. More importantly, graphene wires need to be used more extensively throughout the FPGA to have a big system-wide impact. In larger and more complex FPGAs, the global interconnects have been shown to dominate the delay and energy of the system [17] - [19] . Replacing the clock mesh or global wires with graphene presents another opportunity for future designs.
V. CONCLUSION
We have demonstrated signal routing on four-layer graphene wires integrated on a custom-designed FPGA. Graphene sheets are synthesized using Cu foils and then integrated on top of the CMOS chip. The FPGA test chip implements a 5 × 5 logic array and includes a local tester at each L = 2 wire segment to monitor the delay of each graphene wire. The graphene wires have (2.8×) lower capacitance than the metal wires, resulting in up to 2.11× faster speeds and 1.54× lower (L = 2 wire) energy when driven by a low-swing voltage at 0.4 V. We project roughly 3× reduction in (L = 2 wire) energy if all M5 wires are replaced by graphene. Nonetheless, L = 2 energy comprises a small portion of the total energy due to the small size of the FPGA presented in this paper. The amount of the total energy improvement is expected to increase as larger systems are designed that use longer graphene wires and use such wires more extensively. This paper has demonstrated the first complete graphene-based system application and has shown the potential of using low-capacitance graphene wires for ultralow power electronics. 
