4 research outputs found
SGEM: Test-Time Adaptation for Automatic Speech Recognition via Sequential-Level Generalized Entropy Minimization
Automatic speech recognition (ASR) models are frequently exposed to data
distribution shifts in many real-world scenarios, leading to erroneous
predictions. To tackle this issue, an existing test-time adaptation (TTA)
method has recently been proposed to adapt the pre-trained ASR model on
unlabeled test instances without source data. Despite decent performance gain,
this work relies solely on naive greedy decoding and performs adaptation across
timesteps at a frame level, which may not be optimal given the sequential
nature of the model output. Motivated by this, we propose a novel TTA
framework, dubbed SGEM, for general ASR models. To treat the sequential output,
SGEM first exploits beam search to explore candidate output logits and selects
the most plausible one. Then, it utilizes generalized entropy minimization and
negative sampling as unsupervised objectives to adapt the model. SGEM achieves
state-of-the-art performance for three mainstream ASR models under various
domain shifts.Comment: Accepted to INTERSPEECH 202
One-bit Supervision for Image Classification: Problem, Solution, and Beyond
This paper presents one-bit supervision, a novel setting of learning with
fewer labels, for image classification. Instead of training model using the
accurate label of each sample, our setting requires the model to interact with
the system by predicting the class label of each sample and learn from the
answer whether the guess is correct, which provides one bit (yes or no) of
information. An intriguing property of the setting is that the burden of
annotation largely alleviates in comparison to offering the accurate label.
There are two keys to one-bit supervision, which are (i) improving the guess
accuracy and (ii) making good use of the incorrect guesses. To achieve these
goals, we propose a multi-stage training paradigm and incorporate negative
label suppression into an off-the-shelf semi-supervised learning algorithm.
Theoretical analysis shows that one-bit annotation is more efficient than
full-bit annotation in most cases and gives the conditions of combining our
approach with active learning. Inspired by this, we further integrate the
one-bit supervision framework into the self-supervised learning algorithm which
yields an even more efficient training schedule. Different from training from
scratch, when self-supervised learning is used for initialization, both hard
example mining and class balance are verified effective in boosting the
learning performance. However, these two frameworks still need full-bit labels
in the initial stage. To cast off this burden, we utilize unsupervised domain
adaptation to train the initial model and conduct pure one-bit annotations on
the target dataset. In multiple benchmarks, the learning efficiency of the
proposed approach surpasses that using full-bit, semi-supervised supervision.Comment: ACM TOMM. arXiv admin note: text overlap with arXiv:2009.0616