162 research outputs found
Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering
Despite the empirical success and practical significance of (relational)
knowledge distillation that matches (the relations of) features between teacher
and student models, the corresponding theoretical interpretations remain
limited for various knowledge distillation paradigms. In this work, we take an
initial step toward a theoretical understanding of relational knowledge
distillation (RKD), with a focus on semi-supervised classification problems. We
start by casting RKD as spectral clustering on a population-induced graph
unveiled by a teacher model. Via a notion of clustering error that quantifies
the discrepancy between the predicted and ground truth clusterings, we
illustrate that RKD over the population provably leads to low clustering error.
Moreover, we provide a sample complexity bound for RKD with limited unlabeled
samples. For semi-supervised learning, we further demonstrate the label
efficiency of RKD through a general framework of cluster-aware semi-supervised
learning that assumes low clustering errors. Finally, by unifying data
augmentation consistency regularization into this cluster-aware framework, we
show that despite the common effect of learning accurate clusterings, RKD
facilitates a "global" perspective through spectral clustering, whereas
consistency regularization focuses on a "local" perspective via expansion
Multimodal Transformer Distillation for Audio-Visual Synchronization
Audio-visual synchronization aims to determine whether the mouth movements
and speech in the video are synchronized. VocaLiST reaches state-of-the-art
performance by incorporating multimodal Transformers to model audio-visual
interact information. However, it requires high computing resources, making it
impractical for real-world applications. This paper proposed an MTDVocaLiST
model, which is trained by our proposed multimodal Transformer distillation
(MTD) loss. MTD loss enables MTDVocaLiST model to deeply mimic the
cross-attention distribution and value-relation in the Transformer of VocaLiST.
Our proposed method is effective in two aspects: From the distillation method
perspective, MTD loss outperforms other strong distillation baselines. From the
distilled model's performance perspective: 1) MTDVocaLiST outperforms
similar-size SOTA models, SyncNet, and PM models by 15.69% and 3.39%; 2)
MTDVocaLiST reduces the model size of VocaLiST by 83.52%, yet still maintaining
similar performance.Comment: Submitted to ICASSP 202
Efficient speech detection in environmental audio using acoustic recognition and knowledge distillation
The ongoing biodiversity crisis, driven by factors such as land-use change
and global warming, emphasizes the need for effective ecological monitoring
methods. Acoustic monitoring of biodiversity has emerged as an important
monitoring tool. Detecting human voices in soundscape monitoring projects is
useful both for analysing human disturbance and for privacy filtering. Despite
significant strides in deep learning in recent years, the deployment of large
neural networks on compact devices poses challenges due to memory and latency
constraints. Our approach focuses on leveraging knowledge distillation
techniques to design efficient, lightweight student models for speech detection
in bioacoustics. In particular, we employed the MobileNetV3-Small-Pi model to
create compact yet effective student architectures to compare against the
larger EcoVAD teacher model, a well-regarded voice detection architecture in
eco-acoustic monitoring. The comparative analysis included examining various
configurations of the MobileNetV3-Small-Pi derived student models to identify
optimal performance. Additionally, a thorough evaluation of different
distillation techniques was conducted to ascertain the most effective method
for model selection. Our findings revealed that the distilled models exhibited
comparable performance to the EcoVAD teacher model, indicating a promising
approach to overcoming computational barriers for real-time ecological
monitoring
- …