Search CORE

2,553 research outputs found

EXPORT-ORIENTED ECONOMY IN CHINA

Author: YIfan Guo
Publication venue
Publication date: 12/12/2013
Field of study

NTK-approximating MLP Fusion for Efficient Language Model Fine-tuning

Author: Chen Yifan
Guo Zeming
He Jingrui
Wei Tianxin
Publication venue
Publication date: 17/07/2023
Field of study

Fine-tuning a pre-trained language model (PLM) emerges as the predominant strategy in many natural language processing applications. However, even fine-tuning the PLMs and doing inference are expensive, especially on edge devices with low computing power. Some general approaches (e.g. quantization and distillation) have been widely studied to reduce the compute/memory of PLM fine-tuning, while very few one-shot compression techniques are explored. In this paper, we investigate the neural tangent kernel (NTK)--which reveals the gradient descent dynamics of neural networks--of the multilayer perceptrons (MLP) modules in a PLM and propose to coin a lightweight PLM through NTK-approximating MLP fusion. To achieve this, we reconsider the MLP as a bundle of sub-MLPs, and cluster them into a given number of centroids, which can then be restored as a compressed MLP and surprisingly shown to well approximate the NTK of the original PLM. Extensive experiments of PLM fine-tuning on both natural language understanding (NLU) and generation (NLG) tasks are provided to verify the effectiveness of the proposed method MLP fusion. Our code is available at https://github.com/weitianxin/MLP_Fusion.Comment: ICML 202

arXiv.org e-Print Archive

Distributionally Robust Circuit Design Optimization under Variation Shifts

Author: Guo Nanlin
He Zichang
Pan Yifan
Zhang Zheng
Publication venue
Publication date: 15/08/2023
Field of study

Due to the significant process variations, designers have to optimize the statistical performance distribution of nano-scale IC design in most cases. This problem has been investigated for decades under the formulation of stochastic optimization, which minimizes the expected value of a performance metric while assuming that the distribution of process variation is exactly given. This paper rethinks the variation-aware circuit design optimization from a new perspective. First, we discuss the variation shift problem, which means that the actual density function of process variations almost always differs from the given model and is often unknown. Consequently, we propose to formulate the variation-aware circuit design optimization as a distributionally robust optimization problem, which does not require the exact distribution of process variations. By selecting an appropriate uncertainty set for the probability density function of process variations, we solve the shift-aware circuit optimization problem using distributionally robust Bayesian optimization. This method is validated with both a photonic IC and an electronics IC. Our optimized circuits show excellent robustness against variation shifts: the optimized circuit has excellent performance under many possible distributions of process variations that differ from the given statistical model. This work has the potential to enable a new research direction and inspire subsequent research at different levels of the EDA flow under the setting of variation shift.Comment: accepted by ICCAD 2023, 8 page

arXiv.org e-Print Archive

Dense Vision Transformer Compression with Few Samples

Author: Wang Guo-Hua
Wu Jianxin
Zhang Hanxiao
Zhou Yifan
Publication venue
Publication date: 27/03/2024
Field of study

Few-shot model compression aims to compress a large model into a more compact one with only a tiny training set (even without labels). Block-level pruning has recently emerged as a leading technique in achieving high accuracy and low latency in few-shot CNN compression. But, few-shot compression for Vision Transformers (ViT) remains largely unexplored, which presents a new challenge. In particular, the issue of sparse compression exists in traditional CNN few-shot methods, which can only produce very few compressed models of different model sizes. This paper proposes a novel framework for few-shot ViT compression named DC-ViT. Instead of dropping the entire block, DC-ViT selectively eliminates the attention module while retaining and reusing portions of the MLP module. DC-ViT enables dense compression, which outputs numerous compressed models that densely populate the range of model complexity. DC-ViT outperforms state-of-the-art few-shot compression methods by a significant margin of 10 percentage points, along with lower latency in the compression of ViT and its variants.Comment: Accepted to CVPR 2024. Note: Jianxin Wu is a contributing author for the arXiv version of this paper but is not listed as an author in the CVPR version due to his role as Program Chai

arXiv.org e-Print Archive