66 research outputs found
Application of Butterfly Clos-Network in Network-on-Chip
This paper studied the topology of NoC (Network-on-Chip). By combining the characteristics of the Clos network and butterfly network, a new topology named BFC (Butterfly Clos-network) network was proposed. This topology integrates several modules, which belongs to the same layer but different dimensions, into a new module. In the BFC network, a bidirectional link is used to complete information exchange, instead of information exchange between different layers in the original network. During the routing period, other nondestination nodes can be used as middle stages to transfer data packets to complete the routing mission. Therefore, this topology has the characteristic of multistage. Simulation analyses show that BFC inherits the rich path diversity of Clos network, and it has a better performance than butterfly network in throughput and delay in a quite congested traffic pattern
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition
Error correction techniques have been used to refine the output sentences
from automatic speech recognition (ASR) models and achieve a lower word error
rate (WER) than original ASR outputs. Previous works usually use a
sequence-to-sequence model to correct an ASR output sentence autoregressively,
which causes large latency and cannot be deployed in online ASR services. A
straightforward solution to reduce latency, inspired by non-autoregressive
(NAR) neural machine translation, is to use an NAR sequence generation model
for ASR error correction, which, however, comes at the cost of significantly
increased ASR error rate. In this paper, observing distinctive error patterns
and correction operations (i.e., insertion, deletion, and substitution) in ASR,
we propose FastCorrect, a novel NAR error correction model based on edit
alignment. In training, FastCorrect aligns each source token from an ASR output
sentence to the target tokens from the corresponding ground-truth sentence
based on the edit distance between the source and target sentences, and
extracts the number of target tokens corresponding to each source token during
edition/correction, which is then used to train a length predictor and to
adjust the source tokens to match the length of the target sentence for
parallel generation. In inference, the token number predicted by the length
predictor is used to adjust the source tokens for target sequence generation.
Experiments on the public AISHELL-1 dataset and an internal industrial-scale
ASR dataset show the effectiveness of FastCorrect for ASR error correction: 1)
it speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER
reduction) compared with the autoregressive correction model; and 2) it
outperforms the popular NAR models adopted in neural machine translation and
text edition by a large margin.Comment: NeurIPS 2021. Code URL: https://github.com/microsoft/NeuralSpeec
Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training
We propose gated language experts and curriculum training to enhance
multilingual transformer transducer models without requiring language
identification (LID) input from users during inference. Our method incorporates
a gating mechanism and LID loss, enabling transformer experts to learn
language-specific information. By combining gated transformer experts with
shared transformer layers, we construct multilingual transformer blocks and
utilize linear experts to effectively regularize the joint network. The
curriculum training scheme leverages LID to guide the gated experts in
improving their respective language performance. Experimental results on a
bilingual task involving English and Spanish demonstrate significant
improvements, with average relative word error reductions of 12.5% and 7.3%
compared to the baseline bilingual and monolingual models, respectively.
Notably, our method achieves performance comparable to the upper-bound model
trained and inferred with oracle LID. Extending our approach to trilingual,
quadrilingual, and pentalingual models reveals similar advantages to those
observed in the bilingual models, highlighting its ease of extension to
multiple languages
On decoder-only architecture for speech-to-text and large language model integration
Large language models (LLMs) have achieved remarkable success in the field of
natural language processing, enabling better human-computer interaction using
natural language. However, the seamless integration of speech signals into LLMs
has not been explored well. The "decoder-only" architecture has also not been
well studied for speech processing tasks. In this research, we introduce
Speech-LLaMA, a novel approach that effectively incorporates acoustic
information into text-based large language models. Our method leverages
Connectionist Temporal Classification and a simple audio encoder to map the
compressed acoustic features to the continuous semantic space of the LLM. In
addition, we further probe the decoder-only architecture for speech-to-text
tasks by training a smaller scale randomly initialized speech-LLaMA model from
speech-text paired data alone. We conduct experiments on multilingual
speech-to-text translation tasks and demonstrate a significant improvement over
strong baselines, highlighting the potential advantages of decoder-only models
for speech-to-text conversion
FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition
Error correction is widely used in automatic speech recognition (ASR) to
post-process the generated sentence, and can further reduce the word error rate
(WER). Although multiple candidates are generated by an ASR system through beam
search, current error correction approaches can only correct one sentence at a
time, failing to leverage the voting effect from multiple candidates to better
detect and correct error tokens. In this work, we propose FastCorrect 2, an
error correction model that takes multiple ASR candidates as input for better
correction accuracy. FastCorrect 2 adopts non-autoregressive generation for
fast inference, which consists of an encoder that processes multiple source
sentences and a decoder that generates the target sentence in parallel from the
adjusted source sentence, where the adjustment is based on the predicted
duration of each source token. However, there are some issues when handling
multiple source sentences. First, it is non-trivial to leverage the voting
effect from multiple source sentences since they usually vary in length. Thus,
we propose a novel alignment algorithm to maximize the degree of token
alignment among multiple sentences in terms of token and pronunciation
similarity. Second, the decoder can only take one adjusted source sentence as
input, while there are multiple source sentences. Thus, we develop a candidate
predictor to detect the most suitable candidate for the decoder. Experiments on
our inhouse dataset and AISHELL-1 show that FastCorrect 2 can further reduce
the WER over the previous correction model with single candidate by 3.2% and
2.6%, demonstrating the effectiveness of leveraging multiple candidates in ASR
error correction. FastCorrect 2 achieves better performance than the cascaded
re-scoring and correction pipeline and can serve as a unified post-processing
module for ASR.Comment: Findings of EMNLP 202
Inactivation of the positive LuxR-type oligomycin biosynthesis regulators OlmRI and OlmRII increases avermectin production in Streptomyces avermitilis
Recommended from our members
Minimum Information about a Biosynthetic Gene cluster
A wide variety of enzymatic pathways that produce specialized metabolites in bacteria, fungi and plants are known to be encoded in biosynthetic gene clusters. Information about these clusters, pathways and metabolites is currently dispersed throughout the literature, making it difficult to exploit. To facilitate consistent and systematic deposition and retrieval of data on biosynthetic gene clusters, we propose the Minimum Information about a Biosynthetic Gene cluster (MIBiG) data standard.Chemistry and Chemical Biolog
Quantum Image Encryption Scheme Using Arnold Transform and S-box Scrambling
The paper proposes a lossless quantum image encryption scheme based on substitution tables (S-box) scrambling, mutation operation and general Arnold transform with keys. First, the key generator builds upon the foundation of SHA-256 hash with plain-image and a random sequence. Its output value is used to yield initial conditions and parameters of the proposed image encryption scheme. Second, the permutation and gray-level encryption architecture is built by discrete Arnold map and quantum chaotic map. Before the permutation of Arnold transform, the pixel value is modified by quantum chaos sequence. In order to get high scrambling and randomness, S-box and mutation operation are exploited in gray-level encryption stage. The combination of linear transformation and nonlinear transformation ensures the complexity of the proposed scheme and avoids harmful periodicity. The simulation shows the cipher-image has a fairly uniform histogram, low correlation coefficients closed to 0, high information entropy closed to 8. The proposed cryptosystem provides 2256 key space and performs fast computational efficiency (speed = 11.920875 Mbit/s). Theoretical analyses and experimental results prove that the proposed scheme has strong resistance to various existing attacks and high level of security
Correction: Identification of ANXA3 as a biomarker associated with pyroptosis in ischemic stroke
- …