66 research outputs found

    Application of Butterfly Clos-Network in Network-on-Chip

    Get PDF
    This paper studied the topology of NoC (Network-on-Chip). By combining the characteristics of the Clos network and butterfly network, a new topology named BFC (Butterfly Clos-network) network was proposed. This topology integrates several modules, which belongs to the same layer but different dimensions, into a new module. In the BFC network, a bidirectional link is used to complete information exchange, instead of information exchange between different layers in the original network. During the routing period, other nondestination nodes can be used as middle stages to transfer data packets to complete the routing mission. Therefore, this topology has the characteristic of multistage. Simulation analyses show that BFC inherits the rich path diversity of Clos network, and it has a better performance than butterfly network in throughput and delay in a quite congested traffic pattern

    FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition

    Full text link
    Error correction techniques have been used to refine the output sentences from automatic speech recognition (ASR) models and achieve a lower word error rate (WER) than original ASR outputs. Previous works usually use a sequence-to-sequence model to correct an ASR output sentence autoregressively, which causes large latency and cannot be deployed in online ASR services. A straightforward solution to reduce latency, inspired by non-autoregressive (NAR) neural machine translation, is to use an NAR sequence generation model for ASR error correction, which, however, comes at the cost of significantly increased ASR error rate. In this paper, observing distinctive error patterns and correction operations (i.e., insertion, deletion, and substitution) in ASR, we propose FastCorrect, a novel NAR error correction model based on edit alignment. In training, FastCorrect aligns each source token from an ASR output sentence to the target tokens from the corresponding ground-truth sentence based on the edit distance between the source and target sentences, and extracts the number of target tokens corresponding to each source token during edition/correction, which is then used to train a length predictor and to adjust the source tokens to match the length of the target sentence for parallel generation. In inference, the token number predicted by the length predictor is used to adjust the source tokens for target sequence generation. Experiments on the public AISHELL-1 dataset and an internal industrial-scale ASR dataset show the effectiveness of FastCorrect for ASR error correction: 1) it speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model; and 2) it outperforms the popular NAR models adopted in neural machine translation and text edition by a large margin.Comment: NeurIPS 2021. Code URL: https://github.com/microsoft/NeuralSpeec

    Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

    Full text link
    We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference. Our method incorporates a gating mechanism and LID loss, enabling transformer experts to learn language-specific information. By combining gated transformer experts with shared transformer layers, we construct multilingual transformer blocks and utilize linear experts to effectively regularize the joint network. The curriculum training scheme leverages LID to guide the gated experts in improving their respective language performance. Experimental results on a bilingual task involving English and Spanish demonstrate significant improvements, with average relative word error reductions of 12.5% and 7.3% compared to the baseline bilingual and monolingual models, respectively. Notably, our method achieves performance comparable to the upper-bound model trained and inferred with oracle LID. Extending our approach to trilingual, quadrilingual, and pentalingual models reveals similar advantages to those observed in the bilingual models, highlighting its ease of extension to multiple languages

    On decoder-only architecture for speech-to-text and large language model integration

    Full text link
    Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has not been explored well. The "decoder-only" architecture has also not been well studied for speech processing tasks. In this research, we introduce Speech-LLaMA, a novel approach that effectively incorporates acoustic information into text-based large language models. Our method leverages Connectionist Temporal Classification and a simple audio encoder to map the compressed acoustic features to the continuous semantic space of the LLM. In addition, we further probe the decoder-only architecture for speech-to-text tasks by training a smaller scale randomly initialized speech-LLaMA model from speech-text paired data alone. We conduct experiments on multilingual speech-to-text translation tasks and demonstrate a significant improvement over strong baselines, highlighting the potential advantages of decoder-only models for speech-to-text conversion

    FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition

    Full text link
    Error correction is widely used in automatic speech recognition (ASR) to post-process the generated sentence, and can further reduce the word error rate (WER). Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens. In this work, we propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy. FastCorrect 2 adopts non-autoregressive generation for fast inference, which consists of an encoder that processes multiple source sentences and a decoder that generates the target sentence in parallel from the adjusted source sentence, where the adjustment is based on the predicted duration of each source token. However, there are some issues when handling multiple source sentences. First, it is non-trivial to leverage the voting effect from multiple source sentences since they usually vary in length. Thus, we propose a novel alignment algorithm to maximize the degree of token alignment among multiple sentences in terms of token and pronunciation similarity. Second, the decoder can only take one adjusted source sentence as input, while there are multiple source sentences. Thus, we develop a candidate predictor to detect the most suitable candidate for the decoder. Experiments on our inhouse dataset and AISHELL-1 show that FastCorrect 2 can further reduce the WER over the previous correction model with single candidate by 3.2% and 2.6%, demonstrating the effectiveness of leveraging multiple candidates in ASR error correction. FastCorrect 2 achieves better performance than the cascaded re-scoring and correction pipeline and can serve as a unified post-processing module for ASR.Comment: Findings of EMNLP 202

    Quantum Image Encryption Scheme Using Arnold Transform and S-box Scrambling

    No full text
    The paper proposes a lossless quantum image encryption scheme based on substitution tables (S-box) scrambling, mutation operation and general Arnold transform with keys. First, the key generator builds upon the foundation of SHA-256 hash with plain-image and a random sequence. Its output value is used to yield initial conditions and parameters of the proposed image encryption scheme. Second, the permutation and gray-level encryption architecture is built by discrete Arnold map and quantum chaotic map. Before the permutation of Arnold transform, the pixel value is modified by quantum chaos sequence. In order to get high scrambling and randomness, S-box and mutation operation are exploited in gray-level encryption stage. The combination of linear transformation and nonlinear transformation ensures the complexity of the proposed scheme and avoids harmful periodicity. The simulation shows the cipher-image has a fairly uniform histogram, low correlation coefficients closed to 0, high information entropy closed to 8. The proposed cryptosystem provides 2256 key space and performs fast computational efficiency (speed = 11.920875 Mbit/s). Theoretical analyses and experimental results prove that the proposed scheme has strong resistance to various existing attacks and high level of security
    corecore