44 research outputs found
Attention-Based End-to-End Speech Recognition on Voice Search
Recently, there has been a growing interest in end-to-end speech recognition
that directly transcribes speech to text without any predefined alignments. In
this paper, we explore the use of attention-based encoder-decoder model for
Mandarin speech recognition on a voice search task. Previous attempts have
shown that applying attention-based encoder-decoder to Mandarin speech
recognition was quite difficult due to the logographic orthography of Mandarin,
the large vocabulary and the conditional dependency of the attention model. In
this paper, we use character embedding to deal with the large vocabulary.
Several tricks are used for effective model training, including L2
regularization, Gaussian weight noise and frame skipping. We compare two
attention mechanisms and use attention smoothing to cover long context in the
attention model. Taken together, these tricks allow us to finally achieve a
character error rate (CER) of 3.58% and a sentence error rate (SER) of 7.43% on
the MiTV voice search dataset. While together with a trigram language model,
CER and SER reach 2.81% and 5.77%, respectively
A meta learning scheme for fast accent domain expansion in Mandarin speech recognition
Spoken languages show significant variation across mandarin and accent.
Despite the high performance of mandarin automatic speech recognition (ASR),
accent ASR is still a challenge task. In this paper, we introduce meta-learning
techniques for fast accent domain expansion in mandarin speech recognition,
which expands the field of accents without deteriorating the performance of
mandarin ASR. Meta-learning or learn-to-learn can learn general relation in
multi domains not only for over-fitting a specific domain. So we select
meta-learning in the domain expansion task. This more essential learning will
cause improved performance on accent domain extension tasks. We combine the
methods of meta learning and freeze of model parameters, which makes the
recognition performance more stable in different cases and the training faster
about 20%. Our approach significantly outperforms other methods about 3%
relatively in the accent domain expansion task. Compared to the baseline model,
it improves relatively 37% under the condition that the mandarin test set
remains unchanged. In addition, it also proved this method to be effective on a
large amount of data with a relative performance improvement of 4% on the
accent test set
Skipformer: A Skip-and-Recover Strategy for Efficient Speech Recognition
Conformer-based attention models have become the de facto backbone model for
Automatic Speech Recognition tasks. A blank symbol is usually introduced to
align the input and output sequences for CTC or RNN-T models. Unfortunately,
the long input length overloads computational budget and memory consumption
quadratically by attention mechanism. In this work, we propose a
"Skip-and-Recover" Conformer architecture, named Skipformer, to squeeze
sequence input length dynamically and inhomogeneously. Skipformer uses an
intermediate CTC output as criteria to split frames into three groups: crucial,
skipping and ignoring. The crucial group feeds into next conformer blocks and
its output joint with skipping group by original temporal order as the final
encoder output. Experiments show that our model reduces the input sequence
length by 31 times on Aishell-1 and 22 times on Librispeech corpus. Meanwhile,
the model can achieve better recognition accuracy and faster inference speed
than recent baseline models. Our code is open-sourced and available online.Comment: Accepted by ICME202
Key Frame Mechanism For Efficient Conformer Based End-to-end Speech Recognition
Recently, Conformer as a backbone network for end-to-end automatic speech
recognition achieved state-of-the-art performance. The Conformer block
leverages a self-attention mechanism to capture global information, along with
a convolutional neural network to capture local information, resulting in
improved performance. However, the Conformer-based model encounters an issue
with the self-attention mechanism, as computational complexity grows
quadratically with the length of the input sequence. Inspired by previous
Connectionist Temporal Classification (CTC) guided blank skipping during
decoding, we introduce intermediate CTC outputs as guidance into the
downsampling procedure of the Conformer encoder. We define the frame with
non-blank output as key frame. Specifically, we introduce the key frame-based
self-attention (KFSA) mechanism, a novel method to reduce the computation of
the self-attention mechanism using key frames. The structure of our proposed
approach comprises two encoders. Following the initial encoder, we introduce an
intermediate CTC loss function to compute the label frame, enabling us to
extract the key frames and blank frames for KFSA. Furthermore, we introduce the
key frame-based downsampling (KFDS) mechanism to operate on high-dimensional
acoustic features directly and drop the frames corresponding to blank labels,
which results in new acoustic feature sequences as input to the second encoder.
By using the proposed method, which achieves comparable or higher performance
than vanilla Conformer and other similar work such as Efficient Conformer.
Meantime, our proposed method can discard more than 60\% useless frames during
model training and inference, which will accelerate the inference speed
significantly. This work code is available in
{https://github.com/scufan1990/Key-Frame-Mechanism-For-Efficient-Conformer}Comment: This manuscript has been accepted by IEEE Signal Processing Letters
for publicatio
Comparison Study of Wide Bandgap Polymer (PBDB-T) and Narrow Bandgap Polymer (PBDTTT-EFT) as Donor for Perylene Diimide Based Polymer Solar Cells
Perylene diimide (PDI) derivatives as a kind of promising non-fullerene-based acceptor (NFA) have got rapid development. However, most of the relevant developmental work has focused on synthesizing novel PDI-based structures, and few paid attentions to the selection of the polymer donor in PDI-based solar cells. Wide bandgap polymer (PBDB-T) and narrow bandgap polymer (PBDTTT-EFT) are known as the most efficient polymer donors in polymer solar cells (PSCs). While PBDB-T is in favor with non-fullerene acceptors achieving power conversion efficiency (PCE) more than 12%, PBDTTT-EFT is one of the best electron donors with fullerene acceptors with PCE up to 10%. Despite the different absorption profiles, the working principle of these benchmark polymer donors with a same electron acceptor, specially PDI-based acceptors, was rarely compared. To this end, we used PBDB-T and PBDTTT-EFT as the electron donors, and 1,1′-bis(2-methoxyethoxyl)-7,7′-(2,5-thienyl) bis-PDI (Bis-PDI-T-EG) as the electron acceptor to fabricate PSCs, and systematically compared their differences in device performance, carrier mobility, recombination mechanism, and film morphology
Benefits and risks of the hormetic effects of dietary isothiocyanates on cancer prevention
The isothiocyanate (ITC) sulforaphane (SFN) was shown at low levels (1-5 µM) to promote cell proliferation to 120-143% of the controls in a number of human cell lines, whilst at high levels (10-40 µM) it inhibited such cell proliferation. Similar dose responses were observed for cell migration, i.e. SFN at 2.5 µM increased cell migration in bladder cancer T24 cells to 128% whilst high levels inhibited cell migration. This hormetic action was also found in an angiogenesis assay where SFN at 2.5 µM promoted endothelial tube formation (118% of the control), whereas at 10-20 µM it caused significant inhibition. The precise mechanism by which SFN influences promotion of cell growth and migration is not known, but probably involves activation of autophagy since an autophagy inhibitor, 3-methyladenine, abolished the effect of SFN on cell migration. Moreover, low doses of SFN offered a protective effect against free-radical mediated cell death, an effect that was enhanced by co-treatment with selenium. These results suggest that SFN may either prevent or promote tumour cell growth depending on the dose and the nature of the target cells. In normal cells, the promotion of cell growth may be of benefit, but in transformed or cancer cells it may be an undesirable risk factor. In summary, ITCs have a biphasic effect on cell growth and migration. The benefits and risks of ITCs are not only determined by the doses, but are affected by interactions with Se and the measured endpoint