260 research outputs found

    Accelerating Transducers through Adjacent Token Merging

    Full text link
    Recent end-to-end automatic speech recognition (ASR) systems often utilize a Transformer-based acoustic encoder that generates embedding at a high frame rate. However, this design is inefficient, particularly for long speech signals due to the quadratic computation of self-attention. To address this, we propose a new method, Adjacent Token Merging (A-ToMe), which gradually combines adjacent tokens with high similarity scores between their key values. In this way, the total time step could be reduced, and the inference of both the encoder and joint network is accelerated. Experiments on LibriSpeech show that our method can reduce 57% of tokens and improve the inference speed on GPU by 70% without any notable loss of accuracy. Additionally, we demonstrate that A-ToMe is also an effective solution to reduce tokens in long-form ASR, where the input speech consists of multiple utterances.Comment: Interspeech 202

    Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

    Full text link
    The integration of Language Models (LMs) has proven to be an effective way to address domain shifts in speech recognition. However, these approaches usually require a significant amount of target domain text data for the training of LMs. Different from these methods, in this work, with only a domain-specific text prompt, we propose two zero-shot ASR domain adaptation methods using LLaMA, a 7-billion-parameter large language model (LLM). LLM is used in two ways: 1) second-pass rescoring: reranking N-best hypotheses of a given ASR system with LLaMA; 2) deep LLM-fusion: incorporating LLM into the decoder of an encoder-decoder based ASR system. Experiments show that, with only one domain prompt, both methods can effectively reduce word error rates (WER) on out-of-domain TedLium-2 and SPGISpeech datasets. Especially, the deep LLM-fusion has the advantage of better recall of entity and out-of-vocabulary words

    Evidence-Efficient Affinity Propagation Scheme for Virtual Machine Placement in Data Center

    Get PDF
    In cloud data center, without efficient virtual machine placement, the overload of any types of resources on physical machines (PM) can easily cause the waste of other types of resources, and frequent costly virtual machine (VM) migration, which further negatively affects quality of service (QoS). To address this problem, in this paper we propose an evidence-efficient affinity propagation scheme for VM placement (EEAP-VMP), which is capable of balancing the workload across various types of resources on the running PMs. Our approach models the problem of searching the desirable destination hosts for the liveVMmigration as the propagation of responsibility and availability. The sum of responsibility and availability represent the accumulated evidence for the selection of candidate destination hosts for the VMs to be migrated. Further, in combination with the presented selection criteria for destination hosts. Extensive experiments are conducted to compare our EEAP-VMP method with the previousVMplacement methods. The experimental results demonstrate that the EEAP-VMP method is highly effective on reducing VM migrations and energy consumption of data centers and in balancing the workload of PMs
    • ā€¦
    corecore