63 research outputs found
Systematic DC/AC Performance Benchmarking of Sub-7-nm Node FinFETs and Nanosheet FETs
In this paper, we systematically evaluate dc/ac performances of sub-7-nm node fin field-effect transistors (FinFETs) and nanosheet FETs (NSEETs) using fully calibrated 3-D TCAD. The stress effects of all the devices were carefully considered in terms of carrier mobility and velocity averaged within the active regions. For detailed AC analysis, the parasitic capacitances were extracted and decomposed into several components using TCAD RF simulation platform. FinFETs improved the gate electrostatics by decreasing fin widths to 5 nm, but the fin heights were unable to improve RC delay due to the trade-off between on-state currents and gate capacitances. The NSEETs have better on-state currents than do the FinFETs because of larger effective widths (W-eff) under the same device area. Particularly p-type NSEETs have larger compressive stress within the active regions affected by metal gate encircling all around the channels, thus improving carrier mobility and velocity much. On the other hand, the NSEETs have larger gate capacitances because larger W-eff increase the gate-to-source/drain overlap and outer-fringing capacitances. In spite of that, sub-7-nm node NSEETs attain better RC delay than sub-7-nm node as well as 10-nm node FinFETs for standard and high performance applications, showing better chance for scaling down to sub-7-nm node and beyond.11Ysciescopu
Bottom oxide Bulk FinFETs Without Punch-Through-Stopper for Extending Toward 5-nm Node
Structural advancements of 5-nm node bulk fin-shaped field-effect transistors (FinFETs) without punch-through-stopper (PTS) were introduced using fully calibrated TCAD for the first time. It is challenging to scale down conventional bulk FinFETs into 5-nm technology node due to the sub-fin leakage increase. Meanwhile, bottom oxide deposition after anisotropic etching for source/drain (S/D) epi formation prevents the sub-fin leakage effectively even without the PTS doping, thus achieving better gate-to-channel controllability. Bottom oxide FinFETs also have smaller gate capacitances than do conventional FinFETs because the parasitic capacitances decrease by smaller S/D epi separated from the bottom Si layer, which reduces junction and outer-fringing capacitances. But smaller S/D epi decreases the stresses along the channel direction, and the effective widths decrease by the bottom oxide layer blocking the current paths at the bottom side of fin channels. Furthermore, increase of the interconnect resistance and capacitance parasitics down to 5-nm node diminishes the improvements of total delays as the interconnect wire length increases greatly. In spite of these drawbacks, 5-nm node bottom oxide FinFETs achieve smaller total delays than do the 7-nm node conventional FinFETs, especially for low-power applications, thus promising for the scalability of bulk FinFETs along with simple and reliable process by avoiding PTS step.11Ysciescopu
Source/Drain Patterning FinFETs as Solution for Physical Area Scaling Toward 5-nm Node
A novel and feasible process scheme to downsize the source/drain (S/D) epitaxy of 5-nm node bulk fin-shaped field-effect transistors (FinFETs) were introduced by using fully-calibrated TCAD for the first time. The S/D epitaxy formed by selective epitaxial growth was diamond-shaped and occupied a large proportion of the device size irrespective of the active channel area. However, this problem was solved by patterning the low-k regions prior to S/D formation by preventing the lateral overgrowth of S/D epitaxy; the so-called S/D patterning (SDP). Its smaller S/D epitaxy decreased the average longitudinal channel stresses and drive currents for NFETs. However, the small diffusions of the boron dopants into the channel regions improved the short-channel effects and alleviated the drive current reduction for PFETs. Gate capacitances decreased greatly by reducing outer-fringing capacitances between the metal-gate stack and S/D regions. Through SPICE simulation based on the virtual source model, operation frequencies and dynamic powers of 15-stage ring oscillators were studied. SDP FinFETs have better circuit performances than the conventional and bottom oxide bulk FinFETs along with smaller active areas, promising for further area scaling through simple and reliable S/D process.11Ysciescopu
Gate-All-Around FETs: Nanowire and Nanosheet Structure
DC/AC performances of 3-nm-node gate-all-around (GAA) FETs having different widths and the number of channels (Nch) from 1 to 5 were investigated thoroughly using fully-calibrated TCAD. There are two types of GAAFETs: nanowire (NW) FETs having the same width (WNW) and thickness of the channels, and nanosheet (NS) FETs having wide width (WNS) but the fixed thickness of the channels as 5 nm. Compared to FinFETs, GAAFETs can maintain good short channel characteristics as the WNW is smaller than 9 nm but irrespective of the WNS. DC performances of the GAAFETs improve as the Nch increases but at decreasing rate because of the parasitic resistances at the source/drain epi. On the other hand, gate capacitances of the GAAFETs increase constantly as the Nch increases. Therefore, the GAAFETs have minimum RC delay at the Nch near 3. For low power applications, NWFETs outperform FinFETs and NSFETs due to their excellent short channel characteristics by 2-D structural confinement. For standard and high performance applications, NSFETs outperform FinFETs and NWFETs by showing superior DC performances arising from larger effective widths per footprint. Overall, GAAFETs are great candidates to substitute FinFETs in the 3-nm technology node for all the applications
An Overview of Energy-Efficient Hardware Accelerators for On-Device Deep-Neural-Network Training
Deep Neural Networks (DNNs) have been widely used in various artificial intelligence (AI) applications due to their overwhelming performance. Furthermore, recently, several algorithms have been reported that require on-device training to deliver higher performance in real-world environments and protect users’ personal data. However, edge/mobile devices contain only limited computation capability with battery power, so an energy-efficient DNN training processor is necessary to realize on-device training. Although there are a lot of surveys on energy-efficient DNN inference hardware, the training is quite different from the inference. Therefore, analysis and optimization techniques targeting DNN training are required. This article aims to provide an overview of energy-efficient DNN processing that enables on-device training. Specifically, it will provide hardware optimization techniques to overcomes the design challenges in terms of distinct dataflow, external memory access, and computation. In addition, this paper summarizes key schemes of recent energy-efficient DNN training ASICs. Moreover, we will also show a design example of DNN training ASIC with energy-efficient optimization techniques
Recommended from our members
Scalable Coverage Maintenance for Dense Wireless Sensor Networks
Owing to numerous potential applications, wireless sensor networks have been attracting significant research effort recently. The critical challenge that wireless sensor networks often face is to sustain long-term operation on limited battery energy. Coverage maintenance schemes can effectively prolong network lifetime by selecting and employing a subset of sensors in the network to provide sufficient sensing coverage over a target region. We envision future wireless sensor networks composed of a vast number of miniaturized sensors in exceedingly high density. Therefore, the key issue of coverage maintenance for future sensor networks is the scalability to sensor deployment density. In this paper, we propose a novel coverage maintenance scheme, scalable coverage maintenance (SCOM), which is scalable to sensor deployment density in terms of communication overhead (i.e., number of transmitted and received beacons) and computational complexity (i.e., time and space complexity). In addition, SCOM achieves high energy efficiency and load balancing over different sensors. We have validated our claims through both analysis and simulations
Scalable Coverage Maintenance for Dense Wireless Sensor Networks
Owing to numerous potential applications, wireless sensor networks have been attracting significant research effort recently. The critical challenge that wireless sensor networks often face is to sustain long-term operation on limited battery energy. Coverage maintenance schemes can effectively prolong network lifetime by selecting and employing a subset of sensors in the network to provide sufficient sensing coverage over a target region. We envision future wireless sensor networks composed of a vast number of miniaturized sensors in exceedingly high density. Therefore, the key issue of coverage maintenance for future sensor networks is the scalability to sensor deployment density. In this paper, we propose a novel coverage maintenance scheme, scalable coverage maintenance (SCOM), which is scalable to sensor deployment density in terms of communication overhead (i.e., number of transmitted and received beacons) and computational complexity (i.e., time and space complexity). In addition, SCOM achieves high energy efficiency and load balancing over different sensors. We have validated our claims through both analysis and simulations.</p
A 9.02mW CNN-stereo-based real-time 3D hand-gesture recognition processor for smart mobile devices
Recently, 3D hand-gesture recognition (HGR) has become an important feature in smart mobile devices, such as head-mounted displays (HMDs) or smartphones for AR/VR applications. A 3D HGR system in Fig. 13.4.1 enables users to interact with virtual 3D objects using depth sensing and hand tracking. However, a previous 3D HGR system, such as Hololens [1], utilized a power consuming time-of-flight (ToF) depth sensor (>2W) limiting 3D HGR operation to less than 3 hours. Even though stereo matching was used instead of ToF for depth sensing with low power consumption [2], it could not provide interaction with virtual 3D objects because depth information was used only for hand segmentation. The HGR-based UI system in smart mobile devices, such as HMDs, must be low power consumption (<;10mW), while maintaining real-time operation (<;33.3ms). A convolutional neural network (CNN) can be adopted to enhance the accuracy of the low-power stereo matching. The CNN-based HGR system comprises two 6-layer CNNs (stereo) without any pooling layers to preserve geometrical information and an iterative-closest-point/particle-swarm optimization-based (ICP-PSO) hand tracking to acquire 3D coordinates of a user's fingertips and palm from the hand depth. The CNN learns the skin color and texture to detect the hand accurately, comparable to ToF, in the low-power stereo matching system irrespective of variations in external conditions [3]. However, it requires >1000 more MAC operations than previous feature-based stereo depth sensing, which is difficult in real-time with a mobile CPU, and therefore, a dedicated low-power CNN-based stereo matching SoC is required
A 31.2pJ/disparity?? pixel stereo matching processor with stereo SRAM for mobile UI application
An energy-efficient and high-speed stereo matching processor is proposed for smart mobile devices with proposed stereo SRAM (S-SRAM) and independent regional integral cost (IRIC). Cost generation unit (CGU) with the proposed S-SRAM reduces 63.2% of CGU power consumption. The proposed IRIC enables cost aggregation unit (CAU) to obtain 6.4?? of speed and 12.3% of the power reduction of CAU with pipelined integral cost generator (PICG). The proposed stereo matching processor, implemented in 65nm CMOS process, achieves 82fps and 31.2pJ/disparity-pixel energy efficiency at 30fps. Its energy efficiency is improved by 77.6% compared to the state-of-the-art
- …