352 research outputs found

    A device-level characterization approach to quantify the impacts of different random variation sources in FinFET technology

    Get PDF
    A simple device-level characterization approach to quantitatively evaluate the impacts of different random variation sources in FinFETs is proposed. The impacts of random dopant fluctuation are negligible for FinFETs with lightly doped channel, leaving metal gate granularity and line-edge roughness as the two major random variation sources. The variations of Vth induced by these two major categories are theoretically decomposed based on the distinction in physical mechanisms and their influences on different electrical characteristics. The effectiveness of the proposed method is confirmed through both TCAD simulations and experimental results. This letter can provide helpful guidelines for variation-aware technology development

    HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference

    Full text link
    Secure two-party computation with homomorphic encryption (HE) protects data privacy with a formal security guarantee but suffers from high communication overhead. While previous works, e.g., Cheetah, Iron, etc, have proposed efficient HE-based protocols for different neural network (NN) operations, they still assume high precision, e.g., fixed point 37 bit, for the NN operations and ignore NNs' native robustness against quantization error. In this paper, we propose HEQuant, which features low-precision-quantization-aware optimization for the HE-based protocols. We observe the benefit of a naive combination of quantization and HE quickly saturates as bit precision goes down. Hence, to further improve communication efficiency, we propose a series of optimizations, including an intra-coefficient packing algorithm and a quantization-aware tiling algorithm, to simultaneously reduce the number and precision of the transferred data. Compared with prior-art HE-based protocols, e.g., CrypTFlow2, Cheetah, Iron, etc, HEQuant achieves 3.523.4×3.5\sim 23.4\times communication reduction and 3.09.3×3.0\sim 9.3\times latency reduction. Meanwhile, when compared with prior-art network optimization frameworks, e.g., SENet, SNL, etc, HEQuant also achieves 3.13.6×3.1\sim 3.6\times communication reduction

    GaSb Inversion-Mode PMOSFETs With Atomic-Layer-Deposited Al2O3 as Gate Dielectric

    Get PDF
    GaSb inversion-mode PMOSFETs with atomic-layer-deposited (ALD) Al2O3 as gate dielectric are demonstrated. A 0.75-mu m-gate-length device has a maximum drain current of 70 mA/mm, a transconductance of 26 mS/mm, and a hole inversion mobility of 200 cm(2)/V . s. The OFF-state performance is improved by reducing the ALD growth temperature from 300 degrees C to 200 degrees C. The measured interface trap distribution shows a low interface trap density of 2 x 10(12) /cm(2) . eV near the valence band edge. However, it increases to 1 - 4 x 10(13) /cm(2) . eV near the conduction band edge, leading to a drain current on-off ratio of 265 and a subthreshold swing of similar to 600 mV/decade. GaSb, similar to Ge, is a promising channel material for PMOSFETs due to its high bulk hole mobility, high density of states at the valence band edge, and, most importantly, its unique interface trap distribution and trap neutral level alignment

    EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based Protocol and Quantization Co-Optimization

    Full text link
    Private convolutional neural network (CNN) inference based on secure two-party computation (2PC) suffers from high communication and latency overhead, especially from convolution layers. In this paper, we propose EQO, a quantized 2PC inference framework that jointly optimizes the CNNs and 2PC protocols. EQO features a novel 2PC protocol that combines Winograd transformation with quantization for efficient convolution computation. However, we observe naively combining quantization and Winograd convolution is sub-optimal: Winograd transformations introduce extensive local additions and weight outliers that increase the quantization bit widths and require frequent bit width conversions with non-negligible communication overhead. Therefore, at the protocol level, we propose a series of optimizations for the 2PC inference graph to minimize the communication. At the network level, We develop a sensitivity-based mixed-precision quantization algorithm to optimize network accuracy given communication constraints. We further propose a 2PC-friendly bit re-weighting algorithm to accommodate weight outliers without increasing bit widths. With extensive experiments, EQO demonstrates 11.7x, 3.6x, and 6.3x communication reduction with 1.29%, 1.16%, and 1.29% higher accuracy compared to state-of-the-art frameworks SiRNN, COINN, and CoPriv, respectively

    HybridNet: Dual-Branch Fusion of Geometrical and Topological Views for VLSI Congestion Prediction

    Full text link
    Accurate early congestion prediction can prevent unpleasant surprises at the routing stage, playing a crucial character in assisting designers to iterate faster in VLSI design cycles. In this paper, we introduce a novel strategy to fully incorporate topological and geometrical features of circuits by making several key designs in our network architecture. To be more specific, we construct two individual graphs (geometry-graph, topology-graph) with distinct edge construction schemes according to their unique properties. We then propose a dual-branch network with different encoder layers in each pathway and aggregate representations with a sophisticated fusion strategy. Our network, named HybridNet, not only provides a simple yet effective way to capture the geometric interactions of cells, but also preserves the original topological relationships in the netlist. Experimental results on the ISPD2015 benchmarks show that we achieve an improvement of 10.9% compared to previous methods

    ASCEND: Accurate yet Efficient End-to-End Stochastic Computing Acceleration of Vision Transformer

    Full text link
    Stochastic computing (SC) has emerged as a promising computing paradigm for neural acceleration. However, how to accelerate the state-of-the-art Vision Transformer (ViT) with SC remains unclear. Unlike convolutional neural networks, ViTs introduce notable compatibility and efficiency challenges because of their nonlinear functions, e.g., softmax and Gaussian Error Linear Units (GELU). In this paper, for the first time, a ViT accelerator based on end-to-end SC, dubbed ASCEND, is proposed. ASCEND co-designs the SC circuits and ViT networks to enable accurate yet efficient acceleration. To overcome the compatibility challenges, ASCEND proposes a novel deterministic SC block for GELU and leverages an SC-friendly iterative approximate algorithm to design an accurate and efficient softmax circuit. To improve inference efficiency, ASCEND develops a two-stage training pipeline to produce accurate low-precision ViTs. With extensive experiments, we show the proposed GELU and softmax blocks achieve 56.3% and 22.6% error reduction compared to existing SC designs, respectively and reduce the area-delay product (ADP) by 5.29x and 12.6x, respectively. Moreover, compared to the baseline low-precision ViTs, ASCEND also achieves significant accuracy improvements on CIFAR10 and CIFAR100.Comment: Accepted in DATE 202
    corecore