Search CORE

110 research outputs found

Knowledge distillation via softmax regression representation learning

Author: Bulat A
International Conference on Learning Representations (ICLR)
Martinez B
Tzimiropoulos G
Yang J
Publication venue: International Conference on Learning Representations (ICLR)
Publication date: 04/05/2021
Field of study

Graph Relation Distillation for Efficient Biomedical Instance Segmentation

Author: Hu Bo
Huang Wei
Liu Xiaoyu
Sun Xiaoyan
Wu Feng
Xiong Zhiwei
Zhang Yueyi
Publication venue
Publication date: 11/01/2024
Field of study

Instance-aware embeddings predicted by deep neural networks have revolutionized biomedical instance segmentation, but its resource requirements are substantial. Knowledge distillation offers a solution by transferring distilled knowledge from heavy teacher networks to lightweight yet high-performance student networks. However, existing knowledge distillation methods struggle to extract knowledge for distinguishing instances and overlook global relation information. To address these challenges, we propose a graph relation distillation approach for efficient biomedical instance segmentation, which considers three essential types of knowledge: instance-level features, instance relations, and pixel-level boundaries. We introduce two graph distillation schemes deployed at both the intra-image level and the inter-image level: instance graph distillation (IGD) and affinity graph distillation (AGD). IGD constructs a graph representing instance features and relations, transferring these two types of knowledge by enforcing instance graph consistency. AGD constructs an affinity graph representing pixel relations to capture structured knowledge of instance boundaries, transferring boundary-related knowledge by ensuring pixel affinity consistency. Experimental results on a number of biomedical datasets validate the effectiveness of our approach, enabling student models with less than

1\%

parameters and less than

10\%

inference time while achieving promising performance compared to teacher models

arXiv.org e-Print Archive

Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement

Author: Ai Yang
Ling Zhen-Hua
Zheng Rui-Chen
Publication venue
Publication date: 19/09/2023
Field of study

Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with extra visual information such as lip videos, and has been shown to be more effective than audio-only speech enhancement. This paper proposes the incorporation of ultrasound tongue images to improve the performance of lip-based AV-SE systems further. To address the challenge of acquiring ultrasound tongue images during inference, we first propose to employ knowledge distillation during training to investigate the feasibility of leveraging tongue-related information without directly inputting ultrasound tongue images. Specifically, we guide an audio-lip speech enhancement student model to learn from a pre-trained audio-lip-tongue speech enhancement teacher model, thus transferring tongue-related knowledge. To better model the alignment between the lip and tongue modalities, we further propose the introduction of a lip-tongue key-value memory network into the AV-SE model. This network enables the retrieval of tongue features based on readily available lip features, thereby assisting the subsequent speech enhancement task. Experimental results demonstrate that both methods significantly improve the quality and intelligibility of the enhanced speech compared to traditional lip-based AV-SE baselines. Moreover, both proposed methods exhibit strong generalization performance on unseen speakers and in the presence of unseen noises. Furthermore, phone error rate (PER) analysis of automatic speech recognition (ASR) reveals that while all phonemes benefit from introducing ultrasound tongue images, palatal and velar consonants benefit most.Comment: Submmited to IEEE/ACM Transactions on Audio, Speech and Language Processing. arXiv admin note: text overlap with arXiv:2305.1493

arXiv.org e-Print Archive

ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model Reuse

Author: Ren Lu
Wang Qi-Wei
Ye Han-Jia
Yi Chao
Zhan De-Chuan
Zhang Yi-Kai
Publication venue
Publication date: 17/08/2023
Field of study

The rapid expansion of foundation pre-trained models and their fine-tuned counterparts has significantly contributed to the advancement of machine learning. Leveraging pre-trained models to extract knowledge and expedite learning in real-world tasks, known as "Model Reuse", has become crucial in various applications. Previous research focuses on reusing models within a certain aspect, including reusing model weights, structures, and hypothesis spaces. This paper introduces ZhiJian, a comprehensive and user-friendly toolbox for model reuse, utilizing the PyTorch backend. ZhiJian presents a novel paradigm that unifies diverse perspectives on model reuse, encompassing target architecture construction with PTM, tuning target model with PTM, and PTM-based inference. This empowers deep learning practitioners to explore downstream tasks and identify the complementary advantages among different methods. ZhiJian is readily accessible at https://github.com/zhangyikaii/lamda-zhijian facilitating seamless utilization of pre-trained models and streamlining the model reuse process for researchers and developers

arXiv.org e-Print Archive

Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation

Author: Ai Yang
Ling Zhen-Hua
Zheng Rui-Chen
Publication venue
Publication date: 24/05/2023
Field of study

arXiv.org e-Print Archive