1,270 research outputs found
In search of Lin Zhao\u27s soul = 尋找林昭的靈魂
Film Director: Hu Jie (胡杰)
Film Release Year: 2004https://commons.ln.edu.hk/ccs_worksheet/1001/thumbnail.jp
Learning Hard Alignments with Variational Inference
There has recently been significant interest in hard attention models for
tasks such as object recognition, visual captioning and speech recognition.
Hard attention can offer benefits over soft attention such as decreased
computational cost, but training hard attention models can be difficult because
of the discrete latent variables they introduce. Previous work used REINFORCE
and Q-learning to approach these issues, but those methods can provide
high-variance gradient estimates and be slow to train. In this paper, we tackle
the problem of learning hard attention for a sequential task using variational
inference methods, specifically the recently introduced VIMCO and NVIL.
Furthermore, we propose a novel baseline that adapts VIMCO to this setting. We
demonstrate our method on a phoneme recognition task in clean and noisy
environments and show that our method outperforms REINFORCE, with the
difference being greater for a more complicated task
Improving the Performance of Online Neural Transducer Models
Having a sequence-to-sequence model which can operate in an online fashion is
important for streaming applications such as Voice Search. Neural transducer is
a streaming sequence-to-sequence model, but has shown a significant degradation
in performance compared to non-streaming models such as Listen, Attend and
Spell (LAS). In this paper, we present various improvements to NT.
Specifically, we look at increasing the window over which NT computes
attention, mainly by looking backwards in time so the model still remains
online. In addition, we explore initializing a NT model from a LAS-trained
model so that it is guided with a better alignment. Finally, we explore
including stronger language models such as using wordpiece models, and applying
an external LM during the beam search. On a Voice Search task, we find with
these improvements we can get NT to match the performance of LAS
Recommended from our members
The Association between Virus Prevalence and Intercolonial Aggression Levels in the Yellow Crazy Ant, Anoplolepis Gracilipes (Jerdon).
The recent discovery of multiple viruses in ants, along with the widespread infection of their hosts across geographic ranges, provides an excellent opportunity to test whether viral prevalence in the field is associated with the complexity of social interactions in the ant population. In this study, we examined whether the association exists between the field prevalence of a virus and the intercolonial aggression of its ant host, using the yellow crazy ant (Anoplolepis gracilipes) and its natural viral pathogen (TR44839 virus) as a model system. We delimitated the colony boundary and composition of A. gracilipes in a total of 12 study sites in Japan (Okinawa), Taiwan, and Malaysia (Penang), through intercolonial aggression assay. The spatial distribution and prevalence level of the virus was then mapped for each site. The virus occurred at a high prevalence in the surveyed colonies of Okinawa and Taiwan (100% infection rate across all sites), whereas virus prevalence was variable (30%-100%) or none (0%) at the sites in Penang. Coincidentally, colonies in Okinawa and Taiwan displayed a weak intercolonial boundary, as aggression between colonies is generally low or moderate. Contrastingly, sites in Penang were found to harbor a high proportion of mutually aggressive colonies, a pattern potentially indicative of complex colony composition. Our statistical analyses further confirmed the observed correlation, implying that intercolonial interactions likely contribute as one of the effective facilitators of/barriers to virus prevalence in the field population of this ant species
Improved Noisy Student Training for Automatic Speech Recognition
Recently, a semi-supervised learning method known as "noisy student training"
has been shown to improve image classification performance of deep networks
significantly. Noisy student training is an iterative self-training method that
leverages augmentation to improve network performance. In this work, we adapt
and improve noisy student training for automatic speech recognition, employing
(adaptive) SpecAugment as the augmentation method. We find effective methods to
filter, balance and augment the data generated in between self-training
iterations. By doing so, we are able to obtain word error rates (WERs)
4.2%/8.6% on the clean/noisy LibriSpeech test sets by only using the clean 100h
subset of LibriSpeech as the supervised set and the rest (860h) as the
unlabeled set. Furthermore, we are able to achieve WERs 1.7%/3.4% on the
clean/noisy LibriSpeech test sets by using the unlab-60k subset of LibriLight
as the unlabeled set for LibriSpeech 960h. We are thus able to improve upon the
previous state-of-the-art clean/noisy test WERs achieved on LibriSpeech 100h
(4.74%/12.20%) and LibriSpeech (1.9%/4.1%).Comment: 5 pages, 5 figures, 4 tables; v2: minor revisions, reference adde
- …