352 research outputs found
A device-level characterization approach to quantify the impacts of different random variation sources in FinFET technology
A simple device-level characterization approach to quantitatively evaluate the impacts of different random variation sources in FinFETs is proposed. The impacts of random dopant fluctuation are negligible for FinFETs with lightly doped channel, leaving metal gate granularity and line-edge roughness as the two major random variation sources. The variations of Vth induced by these two major categories are theoretically decomposed based on the distinction in physical mechanisms and their influences on different electrical characteristics. The effectiveness of the proposed method is confirmed through both TCAD simulations and experimental results. This letter can provide helpful guidelines for variation-aware technology development
HEQuant: Marrying Homomorphic Encryption and Quantization for Communication-Efficient Private Inference
Secure two-party computation with homomorphic encryption (HE) protects data
privacy with a formal security guarantee but suffers from high communication
overhead. While previous works, e.g., Cheetah, Iron, etc, have proposed
efficient HE-based protocols for different neural network (NN) operations, they
still assume high precision, e.g., fixed point 37 bit, for the NN operations
and ignore NNs' native robustness against quantization error. In this paper, we
propose HEQuant, which features low-precision-quantization-aware optimization
for the HE-based protocols. We observe the benefit of a naive combination of
quantization and HE quickly saturates as bit precision goes down. Hence, to
further improve communication efficiency, we propose a series of optimizations,
including an intra-coefficient packing algorithm and a quantization-aware
tiling algorithm, to simultaneously reduce the number and precision of the
transferred data. Compared with prior-art HE-based protocols, e.g., CrypTFlow2,
Cheetah, Iron, etc, HEQuant achieves communication
reduction and latency reduction. Meanwhile, when compared
with prior-art network optimization frameworks, e.g., SENet, SNL, etc, HEQuant
also achieves communication reduction
GaSb Inversion-Mode PMOSFETs With Atomic-Layer-Deposited Al2O3 as Gate Dielectric
GaSb inversion-mode PMOSFETs with atomic-layer-deposited (ALD) Al2O3 as gate dielectric are demonstrated. A 0.75-mu m-gate-length device has a maximum drain current of 70 mA/mm, a transconductance of 26 mS/mm, and a hole inversion mobility of 200 cm(2)/V . s. The OFF-state performance is improved by reducing the ALD growth temperature from 300 degrees C to 200 degrees C. The measured interface trap distribution shows a low interface trap density of 2 x 10(12) /cm(2) . eV near the valence band edge. However, it increases to 1 - 4 x 10(13) /cm(2) . eV near the conduction band edge, leading to a drain current on-off ratio of 265 and a subthreshold swing of similar to 600 mV/decade. GaSb, similar to Ge, is a promising channel material for PMOSFETs due to its high bulk hole mobility, high density of states at the valence band edge, and, most importantly, its unique interface trap distribution and trap neutral level alignment
EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based Protocol and Quantization Co-Optimization
Private convolutional neural network (CNN) inference based on secure
two-party computation (2PC) suffers from high communication and latency
overhead, especially from convolution layers. In this paper, we propose EQO, a
quantized 2PC inference framework that jointly optimizes the CNNs and 2PC
protocols. EQO features a novel 2PC protocol that combines Winograd
transformation with quantization for efficient convolution computation.
However, we observe naively combining quantization and Winograd convolution is
sub-optimal: Winograd transformations introduce extensive local additions and
weight outliers that increase the quantization bit widths and require frequent
bit width conversions with non-negligible communication overhead. Therefore, at
the protocol level, we propose a series of optimizations for the 2PC inference
graph to minimize the communication. At the network level, We develop a
sensitivity-based mixed-precision quantization algorithm to optimize network
accuracy given communication constraints. We further propose a 2PC-friendly bit
re-weighting algorithm to accommodate weight outliers without increasing bit
widths. With extensive experiments, EQO demonstrates 11.7x, 3.6x, and 6.3x
communication reduction with 1.29%, 1.16%, and 1.29% higher accuracy compared
to state-of-the-art frameworks SiRNN, COINN, and CoPriv, respectively
HybridNet: Dual-Branch Fusion of Geometrical and Topological Views for VLSI Congestion Prediction
Accurate early congestion prediction can prevent unpleasant surprises at the
routing stage, playing a crucial character in assisting designers to iterate
faster in VLSI design cycles. In this paper, we introduce a novel strategy to
fully incorporate topological and geometrical features of circuits by making
several key designs in our network architecture. To be more specific, we
construct two individual graphs (geometry-graph, topology-graph) with distinct
edge construction schemes according to their unique properties. We then propose
a dual-branch network with different encoder layers in each pathway and
aggregate representations with a sophisticated fusion strategy. Our network,
named HybridNet, not only provides a simple yet effective way to capture the
geometric interactions of cells, but also preserves the original topological
relationships in the netlist. Experimental results on the ISPD2015 benchmarks
show that we achieve an improvement of 10.9% compared to previous methods
ASCEND: Accurate yet Efficient End-to-End Stochastic Computing Acceleration of Vision Transformer
Stochastic computing (SC) has emerged as a promising computing paradigm for
neural acceleration. However, how to accelerate the state-of-the-art Vision
Transformer (ViT) with SC remains unclear. Unlike convolutional neural
networks, ViTs introduce notable compatibility and efficiency challenges
because of their nonlinear functions, e.g., softmax and Gaussian Error Linear
Units (GELU). In this paper, for the first time, a ViT accelerator based on
end-to-end SC, dubbed ASCEND, is proposed. ASCEND co-designs the SC circuits
and ViT networks to enable accurate yet efficient acceleration. To overcome the
compatibility challenges, ASCEND proposes a novel deterministic SC block for
GELU and leverages an SC-friendly iterative approximate algorithm to design an
accurate and efficient softmax circuit. To improve inference efficiency, ASCEND
develops a two-stage training pipeline to produce accurate low-precision ViTs.
With extensive experiments, we show the proposed GELU and softmax blocks
achieve 56.3% and 22.6% error reduction compared to existing SC designs,
respectively and reduce the area-delay product (ADP) by 5.29x and 12.6x,
respectively. Moreover, compared to the baseline low-precision ViTs, ASCEND
also achieves significant accuracy improvements on CIFAR10 and CIFAR100.Comment: Accepted in DATE 202
- …