260 research outputs found
Accelerating Transducers through Adjacent Token Merging
Recent end-to-end automatic speech recognition (ASR) systems often utilize a
Transformer-based acoustic encoder that generates embedding at a high frame
rate. However, this design is inefficient, particularly for long speech signals
due to the quadratic computation of self-attention. To address this, we propose
a new method, Adjacent Token Merging (A-ToMe), which gradually combines
adjacent tokens with high similarity scores between their key values. In this
way, the total time step could be reduced, and the inference of both the
encoder and joint network is accelerated. Experiments on LibriSpeech show that
our method can reduce 57% of tokens and improve the inference speed on GPU by
70% without any notable loss of accuracy. Additionally, we demonstrate that
A-ToMe is also an effective solution to reduce tokens in long-form ASR, where
the input speech consists of multiple utterances.Comment: Interspeech 202
Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition
The integration of Language Models (LMs) has proven to be an effective way to
address domain shifts in speech recognition. However, these approaches usually
require a significant amount of target domain text data for the training of
LMs. Different from these methods, in this work, with only a domain-specific
text prompt, we propose two zero-shot ASR domain adaptation methods using
LLaMA, a 7-billion-parameter large language model (LLM). LLM is used in two
ways: 1) second-pass rescoring: reranking N-best hypotheses of a given ASR
system with LLaMA; 2) deep LLM-fusion: incorporating LLM into the decoder of an
encoder-decoder based ASR system. Experiments show that, with only one domain
prompt, both methods can effectively reduce word error rates (WER) on
out-of-domain TedLium-2 and SPGISpeech datasets. Especially, the deep
LLM-fusion has the advantage of better recall of entity and out-of-vocabulary
words
Evidence-Efficient Affinity Propagation Scheme for Virtual Machine Placement in Data Center
In cloud data center, without efficient virtual machine placement, the overload of any types of resources on physical machines (PM) can easily cause the waste of other types of resources, and frequent costly virtual machine (VM) migration, which further negatively affects quality of service (QoS). To address this problem, in this paper we propose an evidence-efficient affinity propagation scheme for VM placement (EEAP-VMP), which is capable of balancing the workload across various types of resources on the running PMs. Our approach models the problem of searching the desirable destination hosts for the liveVMmigration as the propagation of responsibility and availability. The sum of responsibility and availability represent the accumulated evidence for the selection of candidate destination hosts for the VMs to be migrated. Further, in combination with the presented selection criteria for destination hosts. Extensive experiments are conducted to compare our EEAP-VMP method with the previousVMplacement methods. The experimental results demonstrate that the EEAP-VMP method is highly effective on reducing VM migrations and energy consumption of data centers and in balancing the workload of PMs
- ā¦