INTRODUCTION
In modern semiconductor technologies, Schottky contacts are often used to reduce the access resistances at the source and drain interfaces of Metal-Oxide-Semiconductor Field Effect Transistors (MOSFETs). However, Schottky barriers can also lead to ambipolarity. The ambipolar phenomenon consists of observing both hole and electron conductions in the same devices and limits the performances of semiconductors. Nevertheless, it is possible to exploit this phenomenon in order to enrich the logic capabilities of the elementary transistors by creating doubleindependent-gate structures. More precisely, the additional gate electrode can be used to control the device polarity dynamically. While being recently demonstrated using Silicon Nanowire FETs (SiNWFETs) [1] , such transistor property defines a new class of emerging devices that inherently implement a two-input comparator rather than a simple switch. Controllable-polarity devices can be realized in diverse technologies, such as silicon nanowires [1, 2] , carbon nanotubes [3] , graphene [4] and nanorelays [5] .
The intrinsic device configurability enables many innovations from a design perspective, ranging from arithmetic logic [6] down to compact reconfigurable logic [7] . Compact reconfigurable logic cells are appealing to re-think the standard reconfigurable architectures, such as the traditional Field Programmable Gate Arrays (FPGAs) architectural scheme.
In this paper, we present a novel logic block architecture that exploits the polarity control at the device level. The logic blocks use ultrafine grain logic gates grouped in 2×2 matrices in order to realize more efficient combinational logic. The new logic blocks then replace the LUTs in the standard FPGA structure. System-level performance evaluations conclude that such an approach leads to an average saving of 64% in area×delay×power figure-of-merit as compared to its standard counterpart architecture at 22-nm node.
The remainder of the paper is organized as follows. In Section II, we devise on the opportunities brought by controllable-polarity devices to realize compact reconfigurable cells and novel FPGA structures. In Section III, we briefly report on the performances improvements given by the approach. In Section IV, we conclude the paper.
II. LEVERAGING DEVICE POLARITY CONTROL IN RECONFIGURABLE SYSTEMS
In this section, we discuss the opportunities given by controllablepolarity devices to realize efficient reconfigurable logic blocks.
A. Ultra-Fine Grain Reconfigurable Logic
We first review the behavior of controllable-polarity transistors, then we report on their use in compact logic cell design.
1) Transistors with Controllable Polarity
Controllable-polarity transistors typically control the height of the Schottky barriers formed at the source and drain contacts. This control is enabled by double-independent-gate structures. In such devices, one gate electrode, called the Control Gate (CG), acts conventionally by turning on and off the device. The other electrode, called the Polarity Gate (PG), acts on the side regions of the device, in proximity of the Source/Drain (S/D) Schottky junctions, and switches the device polarity dynamically between n-and p-type, as illustrated in Fig. 1 . The input and output voltage levels are compatible, resulting in directly-cascadable logic gates [1] . For a complete review on the design opportunities brought by these transistors, we refer the reader to [6] . 
2) Ultra-Fine Grain Reconfigurable Logic Gates
The property of in-field reconfigurability has been used in [7] to build a compact reconfigurable cell. The cell, reported in Fig. 2 978-1-4799-5810-8/14/$31.00 ©2014 IEEE
The 2 and EV 2 are respectively the global precharge and evaluation signals of the two stages. The reconfiguration of the cell depends on the signals applied to the polarity gates V BA , V BB and V BC . Each of these signals is biased with either V DD or Gnd. This results in configuring the related transistors to either n-or p-type, thereby customizing the gate internal circuit. For a detailed description of the circuit operation, we refer the reader to [7] .
B. Matrix Cluster Logic Blocks
In this work, we replace the traditional LUTs by ultra-fine grain logic gates. However, a one-to-one replacement would result in a large overhead in terms of programmable connections and would worsen the already significant imbalance between routing and logic resources in FPGAs. Indeed, in conventional FPGA systems, less than 15% of the area is dedicated to the logic computation while the other resources are used for the structure reconfigurability [8] . Therefore, to increase the logic coverage of the structure, we group the logic gates in layered 2×2 matrices. This arrangement is called Matrix Cluster (MCluster). For the intra-matrix interconnect, we use a butterfly pattern between the two layers of logic cells. Mclusters perform combinational logic functions and place the original LUTs. Fig. 3 shows the organization of a logic block. Each Basic Logic Elements (BLEs) consists of a collection of N MCluster-based BLEs. 
III. EXPERIMENTAL RESULTS
The impact of replacing standard LUTs by MCluster-based structures is evaluated through system-level benchmarking of a set of logic circuits taken from MCNC and ISCAS'89. Our reference FPGA architecture corresponds to a fully homogeneous FPGA architecture. The CMOS reference architecture with 4-input non-fractionable LUTs arranged in logic blocks of N=10 BLEs and I=22 external inputs. This architecture is optimal for homogeneous FPGAs [10] . Our novel structure follows the organization depicted in Fig. 3 with N=10 BLEs of 2×2 MClusters. The evaluations are performed using the VTR benchmarking flow [9] . To handle MCluster-based architectures, we use a specific packer, called MPack [12] . The physical parameters of the different architectures are extracted for a 22-nm technological node, while the electrical performances, i.e. delay and power consumption numbers, of the elementary MCluster and LUT are electrically characterized using HSPICE.
The architectural evaluation considers, as metrics, the area, the critical path delay, the dynamic power consumption and the leakage power. These metrics are computed during the place and route iterations of the flow. The area corresponds to the sum of the logic area, i.e., the area of used CLBs, and the routing area, i.e., the area of the used routing resources. The critical path delay corresponds to the most constrained delay through the implemented structures. Finally, the power consumptions include both the contribution of logic blocks and the contribution of routing structures. All the metrics are normalized with respect to the most constrained CMOS design. Fig. 4 depicts the area×delay×power estimation for MCluster-based FPGA and compares it to its CMOS counterpart. The benchmarks show an improvement of 64% on average. This can be accounted (i) to the performance of a logic-gate-based computation (as compared to the LUT approach) and (ii) to the low area impact of ultra-fine grain logic cells, compared to the rather larger area required by a CMOS LUT. At the same 22-nm technology node, a 2 by 2 cluster is 2× smaller than a 4-input LUT (2.22µm 2 vs. 5.45 µm 2 respectively) and is able to reach higher functionalities. A 4-input LUT computes a single output signal depending on 4 inputs. While MClusters can only realize a subset of the functions reachable by a LUT, they are capable to produce up to 2 outputs that are functions of the same 4 inputs. Thus, MClusters can potentially output 2× more results for roughly 2× less area. Correlated to the efficiency of the packing tool for matrix clustering, this demonstrates a clear advantage of our proposal as compared to the CMOS approach. 
IV. CONCLUSION
In this paper, we report on a novel FPGA logic block architecture that leverages the additional degree of freedom given by controllablepolarity transistors. In particular, we use controllable-polarity transistors to create compact reconfigurable logic cells. These cells, once arranged in 2×2 clusters, replace the traditional LUTs to perform basic combinational operations. Thanks to the increased logic capabilities of these cells, it is possible to improve the performance of FPGA structures with on average a 64% reduction in area×delay×power product as compared to its standard counterpart at 22-nm technology node.
