2,064 research outputs found
Noise adaptive training for subspace Gaussian mixture models
Noise adaptive training (NAT) is an effective approach to normalise the environmental distortions in the training data. This paper investigates the model-based NAT scheme using joint uncertainty decoding (JUD) for subspace Gaussian mixture models (SGMMs). A typical SGMM acoustic model has much larger number of surface Gaussian components, which makes it computationally infeasible to compensate each Gaussian explicitly. JUD tackles the problem by sharing the compensation parameters among the Gaussians and hence reduces the computational and memory demands. For noise adaptive training, JUD is reformulated into a generative model, which leads to an efficient expectation-maximisation (EM) based algorithm to update the SGMM acoustic model parameters. We evaluated the SGMMs with NAT on the Aurora 4 database, and obtained higher recognition accuracy compared to systems without adaptive training. Index Terms: adaptive training, noise robustness, joint uncertainty decoding, subspace Gaussian mixture model
Joint Uncertainty Decoding with Unscented Transform for Noise Robust Subspace Gaussian Mixture Models
Common noise compensation techniques use vector Taylor series (VTS) to approximate the mismatch function. Recent work shows that the approximation accuracy may be improved by sampling. One such sampling technique is the unscented transform (UT), which draws samples deterministically from clean speech and noise model to derive the noise corrupted speech parameters. This paper applies UT to noise compensation of the subspace Gaussian mixture model (SGMM). Since UT requires relatively smaller number of samples for accurate estimation, it has significantly lower computational cost compared to other random sampling techniques. However, the number of surface Gaussians in an SGMM is typically very large, making the direct application of UT, for compensating individual Gaussian components, computationally impractical. In this paper, we avoid the computational burden by employing UT in the framework of joint uncertainty decoding (JUD), which groups all the Gaussian components into small number of classes, sharing the compensation parameters by class. We evaluate the JUD-UT technique for an SGMM system using the Aurora 4 corpus. Experimental results indicate that UT can lead to increased accuracy compared to VTS approximation if the JUD phase factor is untuned, and to similar accuracy if the phase factor is tuned empirically. 1
Metamaterials for Enhanced Polarization Conversion in Plasmonic Excitation
Surface plasmons efficient excitation is typically expected to be strongly constrained to transverse magnetic (TM) polarized incidence, as demonstrated so far, due to its intrinsic TM polarization. We report a designer plasmonic metamaterial that is engineered in a deep subwavelength scale in visible optical frequencies to overcome this fundamental limitation, and allows transverse electric (TE) polarized incidence to be strongly coupled to surface plasmons. The experimental verification, which is consistent with the analytical and numerical models, demonstrates this enhanced TE-to-plasmon coupling with efficiency close to 100%, which is far from what is possible through naturally available materials. This discovery will help to efficiently utilize the energy fallen into TE polarization and drastically increase overall excitation efficiency of future plasmonic devices
Knowledge Distillation for Small-footprint Highway Networks
Deep learning has significantly advanced state-of-the-art of speech
recognition in the past few years. However, compared to conventional Gaussian
mixture acoustic models, neural network models are usually much larger, and are
therefore not very deployable in embedded devices. Previously, we investigated
a compact highway deep neural network (HDNN) for acoustic modelling, which is a
type of depth-gated feedforward neural network. We have shown that HDNN-based
acoustic models can achieve comparable recognition accuracy with much smaller
number of model parameters compared to plain deep neural network (DNN) acoustic
models. In this paper, we push the boundary further by leveraging on the
knowledge distillation technique that is also known as {\it teacher-student}
training, i.e., we train the compact HDNN model with the supervision of a high
accuracy cumbersome model. Furthermore, we also investigate sequence training
and adaptation in the context of teacher-student training. Our experiments were
performed on the AMI meeting speech recognition corpus. With this technique, we
significantly improved the recognition accuracy of the HDNN acoustic model with
less than 0.8 million parameters, and narrowed the gap between this model and
the plain DNN with 30 million parameters.Comment: 5 pages, 2 figures, accepted to icassp 201
Multiplicative LSTM for sequence modelling
We introduce multiplicative LSTM (mLSTM), a recurrent neural network
architecture for sequence modelling that combines the long short-term memory
(LSTM) and multiplicative recurrent neural network architectures. mLSTM is
characterised by its ability to have different recurrent transition functions
for each possible input, which we argue makes it more expressive for
autoregressive density estimation. We demonstrate empirically that mLSTM
outperforms standard LSTM and its deep variants for a range of character level
language modelling tasks. In this version of the paper, we regularise mLSTM to
achieve 1.27 bits/char on text8 and 1.24 bits/char on Hutter Prize. We also
apply a purely byte-level mLSTM on the WikiText-2 dataset to achieve a
character level entropy of 1.26 bits/char, corresponding to a word level
perplexity of 88.8, which is comparable to word level LSTMs regularised in
similar ways on the same task
Female media use behavior and agreement with publicly promoted agenda-specific health messages.
This study set out to explore the relationship between female media use behavior and agreement with agenda-specific publicly promoted health messages. A random digit dial telephone cross-sectional survey was conducted using a nationally representative sample of female residents aged 25 and over. Respondents' agreement with health messages was measured by a six-item Health Information Scale (HIS). Data were analyzed using chi-square tests and multiple logistic regression. This survey achieved a response rate of 86% (n = 1074). In this study the longest duration of daily television news watching (OR = 2.32), high self-efficacy (OR = 1.56), and greater attention to medical and health news (OR = 5.41) were all correlates of greater agreement with the selected health messages. Surprisingly, Internet use was not significant in the final model. Many women that public health interventions need to be targeting are not receptive to health information that can be accessed through Internet searches. However, they may be more readily targeted by television campaigns. Agenda-specific public health campaigns aiming to empower women to serve as nodes of information transmission and achieve efficient trickle down through the family unit might do better to invest more heavily in television promotion
End-to-end neural segmental models for speech recognition
Segmental models are an alternative to frame-based models for sequence
prediction, where hypothesized path weights are based on entire segment scores
rather than a single frame at a time. Neural segmental models are segmental
models that use neural network-based weight functions. Neural segmental models
have achieved competitive results for speech recognition, and their end-to-end
training has been explored in several studies. In this work, we review neural
segmental models, which can be viewed as consisting of a neural network-based
acoustic encoder and a finite-state transducer decoder. We study end-to-end
segmental models with different weight functions, including ones based on
frame-level neural classifiers and on segmental recurrent neural networks. We
study how reducing the search space size impacts performance under different
weight functions. We also compare several loss functions for end-to-end
training. Finally, we explore training approaches, including multi-stage vs.
end-to-end training and multitask training that combines segmental and
frame-level losses
Plant-microbe networks in soil are weakened by century-long use of inorganic fertilizers.
Understanding the changes in plant-microbe interactions is critically important for predicting ecosystem functioning in response to human-induced environmental changes such as nitrogen (N) addition. In this study, the effects of a century-long fertilization treatment (> 150 years) on the networks between plants and soil microbial functional communities, detected by GeoChip, in grassland were determined in the Park Grass Experiment at Rothamsted Research, UK. Our results showed that plants and soil microbes have a consistent response to long-term fertilization-both richness and diversity of plants and soil microbes are significantly decreased, as well as microbial functional genes involved in soil carbon (C), nitrogen (N) and phosphorus (P) cycling. The network-based analyses showed that long-term fertilization decreased the complexity of networks between plant and microbial functional communities in terms of node numbers, connectivity, network density and the clustering coefficient. Similarly, within the soil microbial community, the strength of microbial associations was also weakened in response to long-term fertilization. Mantel path analysis showed that soil C and N contents were the main factors affecting the network between plants and microbes. Our results indicate that century-long fertilization weakens the plant-microbe networks, which is important in improving our understanding of grassland ecosystem functions and stability under long-term agriculture management
Recommended from our members
Study of the dynamic aberrations of the human tear film
The dynamic aberrations introduced by the human tear film are studied by measuring the topography of the tear film surface on 14 subjects using a curvature sensing setup. The RMS wavefront error variation of the data obtained is presented showing the non-negligible contribution of the tear film to overall eye aberrations. The tear film wavefronts are decomposed in their constituent Zernike terms, showing stronger contributions from 4th order terms and terms with vertical symmetry, and the temporal behaviour of these aberrations is analysed
- …
