Augmentation with Projection: Towards an Effective and Efficient Data
  Augmentation Paradigm for Distillation

Hou, Le; Ji, Heng; Li, Jing; Liu, Daogao; Liu, Frederick; Wang, Ziqi; Wu, Yuexin; Yu, Hongkun

Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation

Authors: Le Hou
Heng Ji
Jing Li
Daogao Liu
Frederick Liu
Ziqi Wang
Yuexin Wu
Hongkun Yu
Publication date: 10 March 2023
Publisher

Abstract

Knowledge distillation is one of the primary methods of transferring knowledge from large to small models. However, it requires massive task-specific data, which may not be plausible in many real-world applications. Data augmentation methods such as representation interpolation, token replacement, or augmentation with models are applied to tackle this problem. However, these data augmentation methods either potentially cause shifts in decision boundaries (representation interpolation), are not expressive enough (token replacement), or introduce too much computational overhead (augmentation with models). To this end, we propose AugPro (Augmentation with Projection), an effective and efficient data augmentation method for distillation. Our method builds on top of representation interpolation augmentation methods to maintain the diversity of expressions and converts the augmented data to tokens to avoid shifting decision boundaries. It uses simple operations that come with little computational overhead. The results on multiple GLUE tasks show that our methods can improve distillation performance by a large margin at a low time cost. Codes are available at https://github.com/google-research/google-research/tree/master/augpro.Comment: 20 pages, 5 figures. Accepted by ICLR 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2210.11768

Last time updated on 04/12/2022