2,064 research outputs found

    Noise adaptive training for subspace Gaussian mixture models

    Get PDF
    Noise adaptive training (NAT) is an effective approach to normalise the environmental distortions in the training data. This paper investigates the model-based NAT scheme using joint uncertainty decoding (JUD) for subspace Gaussian mixture models (SGMMs). A typical SGMM acoustic model has much larger number of surface Gaussian components, which makes it computationally infeasible to compensate each Gaussian explicitly. JUD tackles the problem by sharing the compensation parameters among the Gaussians and hence reduces the computational and memory demands. For noise adaptive training, JUD is reformulated into a generative model, which leads to an efficient expectation-maximisation (EM) based algorithm to update the SGMM acoustic model parameters. We evaluated the SGMMs with NAT on the Aurora 4 database, and obtained higher recognition accuracy compared to systems without adaptive training. Index Terms: adaptive training, noise robustness, joint uncertainty decoding, subspace Gaussian mixture model

    Joint Uncertainty Decoding with Unscented Transform for Noise Robust Subspace Gaussian Mixture Models

    Get PDF
    Common noise compensation techniques use vector Taylor series (VTS) to approximate the mismatch function. Recent work shows that the approximation accuracy may be improved by sampling. One such sampling technique is the unscented transform (UT), which draws samples deterministically from clean speech and noise model to derive the noise corrupted speech parameters. This paper applies UT to noise compensation of the subspace Gaussian mixture model (SGMM). Since UT requires relatively smaller number of samples for accurate estimation, it has significantly lower computational cost compared to other random sampling techniques. However, the number of surface Gaussians in an SGMM is typically very large, making the direct application of UT, for compensating individual Gaussian components, computationally impractical. In this paper, we avoid the computational burden by employing UT in the framework of joint uncertainty decoding (JUD), which groups all the Gaussian components into small number of classes, sharing the compensation parameters by class. We evaluate the JUD-UT technique for an SGMM system using the Aurora 4 corpus. Experimental results indicate that UT can lead to increased accuracy compared to VTS approximation if the JUD phase factor is untuned, and to similar accuracy if the phase factor is tuned empirically. 1

    Metamaterials for Enhanced Polarization Conversion in Plasmonic Excitation

    Get PDF
    Surface plasmons efficient excitation is typically expected to be strongly constrained to transverse magnetic (TM) polarized incidence, as demonstrated so far, due to its intrinsic TM polarization. We report a designer plasmonic metamaterial that is engineered in a deep subwavelength scale in visible optical frequencies to overcome this fundamental limitation, and allows transverse electric (TE) polarized incidence to be strongly coupled to surface plasmons. The experimental verification, which is consistent with the analytical and numerical models, demonstrates this enhanced TE-to-plasmon coupling with efficiency close to 100%, which is far from what is possible through naturally available materials. This discovery will help to efficiently utilize the energy fallen into TE polarization and drastically increase overall excitation efficiency of future plasmonic devices

    Knowledge Distillation for Small-footprint Highway Networks

    Get PDF
    Deep learning has significantly advanced state-of-the-art of speech recognition in the past few years. However, compared to conventional Gaussian mixture acoustic models, neural network models are usually much larger, and are therefore not very deployable in embedded devices. Previously, we investigated a compact highway deep neural network (HDNN) for acoustic modelling, which is a type of depth-gated feedforward neural network. We have shown that HDNN-based acoustic models can achieve comparable recognition accuracy with much smaller number of model parameters compared to plain deep neural network (DNN) acoustic models. In this paper, we push the boundary further by leveraging on the knowledge distillation technique that is also known as {\it teacher-student} training, i.e., we train the compact HDNN model with the supervision of a high accuracy cumbersome model. Furthermore, we also investigate sequence training and adaptation in the context of teacher-student training. Our experiments were performed on the AMI meeting speech recognition corpus. With this technique, we significantly improved the recognition accuracy of the HDNN acoustic model with less than 0.8 million parameters, and narrowed the gap between this model and the plain DNN with 30 million parameters.Comment: 5 pages, 2 figures, accepted to icassp 201

    Multiplicative LSTM for sequence modelling

    Get PDF
    We introduce multiplicative LSTM (mLSTM), a recurrent neural network architecture for sequence modelling that combines the long short-term memory (LSTM) and multiplicative recurrent neural network architectures. mLSTM is characterised by its ability to have different recurrent transition functions for each possible input, which we argue makes it more expressive for autoregressive density estimation. We demonstrate empirically that mLSTM outperforms standard LSTM and its deep variants for a range of character level language modelling tasks. In this version of the paper, we regularise mLSTM to achieve 1.27 bits/char on text8 and 1.24 bits/char on Hutter Prize. We also apply a purely byte-level mLSTM on the WikiText-2 dataset to achieve a character level entropy of 1.26 bits/char, corresponding to a word level perplexity of 88.8, which is comparable to word level LSTMs regularised in similar ways on the same task

    Female media use behavior and agreement with publicly promoted agenda-specific health messages.

    Get PDF
    This study set out to explore the relationship between female media use behavior and agreement with agenda-specific publicly promoted health messages. A random digit dial telephone cross-sectional survey was conducted using a nationally representative sample of female residents aged 25 and over. Respondents' agreement with health messages was measured by a six-item Health Information Scale (HIS). Data were analyzed using chi-square tests and multiple logistic regression. This survey achieved a response rate of 86% (n = 1074). In this study the longest duration of daily television news watching (OR = 2.32), high self-efficacy (OR = 1.56), and greater attention to medical and health news (OR = 5.41) were all correlates of greater agreement with the selected health messages. Surprisingly, Internet use was not significant in the final model. Many women that public health interventions need to be targeting are not receptive to health information that can be accessed through Internet searches. However, they may be more readily targeted by television campaigns. Agenda-specific public health campaigns aiming to empower women to serve as nodes of information transmission and achieve efficient trickle down through the family unit might do better to invest more heavily in television promotion

    End-to-end neural segmental models for speech recognition

    Get PDF
    Segmental models are an alternative to frame-based models for sequence prediction, where hypothesized path weights are based on entire segment scores rather than a single frame at a time. Neural segmental models are segmental models that use neural network-based weight functions. Neural segmental models have achieved competitive results for speech recognition, and their end-to-end training has been explored in several studies. In this work, we review neural segmental models, which can be viewed as consisting of a neural network-based acoustic encoder and a finite-state transducer decoder. We study end-to-end segmental models with different weight functions, including ones based on frame-level neural classifiers and on segmental recurrent neural networks. We study how reducing the search space size impacts performance under different weight functions. We also compare several loss functions for end-to-end training. Finally, we explore training approaches, including multi-stage vs. end-to-end training and multitask training that combines segmental and frame-level losses

    Plant-microbe networks in soil are weakened by century-long use of inorganic fertilizers.

    Get PDF
    Understanding the changes in plant-microbe interactions is critically important for predicting ecosystem functioning in response to human-induced environmental changes such as nitrogen (N) addition. In this study, the effects of a century-long fertilization treatment (> 150 years) on the networks between plants and soil microbial functional communities, detected by GeoChip, in grassland were determined in the Park Grass Experiment at Rothamsted Research, UK. Our results showed that plants and soil microbes have a consistent response to long-term fertilization-both richness and diversity of plants and soil microbes are significantly decreased, as well as microbial functional genes involved in soil carbon (C), nitrogen (N) and phosphorus (P) cycling. The network-based analyses showed that long-term fertilization decreased the complexity of networks between plant and microbial functional communities in terms of node numbers, connectivity, network density and the clustering coefficient. Similarly, within the soil microbial community, the strength of microbial associations was also weakened in response to long-term fertilization. Mantel path analysis showed that soil C and N contents were the main factors affecting the network between plants and microbes. Our results indicate that century-long fertilization weakens the plant-microbe networks, which is important in improving our understanding of grassland ecosystem functions and stability under long-term agriculture management
    corecore