7,372 research outputs found

    Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model

    Full text link
    Multilingual models for Automatic Speech Recognition (ASR) are attractive as they have been shown to benefit from more training data, and better lend themselves to adaptation to under-resourced languages. However, initialisation from monolingual context-dependent models leads to an explosion of context-dependent states. Connectionist Temporal Classification (CTC) is a potential solution to this as it performs well with monophone labels. We investigate multilingual CTC in the context of adaptation and regularisation techniques that have been shown to be beneficial in more conventional contexts. The multilingual model is trained to model a universal International Phonetic Alphabet (IPA)-based phone set using the CTC loss function. Learning Hidden Unit Contribution (LHUC) is investigated to perform language adaptive training. In addition, dropout during cross-lingual adaptation is also studied and tested in order to mitigate the overfitting problem. Experiments show that the performance of the universal phoneme-based CTC system can be improved by applying LHUC and it is extensible to new phonemes during cross-lingual adaptation. Updating all the parameters shows consistent improvement on limited data. Applying dropout during adaptation can further improve the system and achieve competitive performance with Deep Neural Network / Hidden Markov Model (DNN/HMM) systems on limited data

    Activation and regioselectivity of five-membered cyclic thionocarbamates to nucleophilic attack

    Get PDF
    The cyclic thionocarbamate of alaninol undergoes nucleophilic attack by sulfur nucleophiles at 5-C to give 1-thiopropyl-2-amine derivatives when derivatised on nitrogen with a Boc group. Iodide under microwave conditions causes a rearrangement to the isomeric thiazolidinone, while "hard" nucleophiles react at the thione group to yield a variety of product types by subsequent C–N or C–O cleavage. X-ray crystallography studies showed that the N-Boc group reduces delocalisation of electron density from nitrogen into the thione group, and thus promotes activation of the ring to nucleophilic attack

    Superior sperm competitors sire higher-quality young

    Get PDF
    The evolution of polyandry remains controversial. This is because, unlike males, in many cases multiple mating by females does not increase fecundity and inevitably involves some costs. As a result, a large number of indirect benefit models have been proposed to explain polyandry. One of these, the good sperm hypothesis, posits that high-quality males are better sperm competitors and sire higher-quality offspring. Hence, by mating multiply, females produce offspring of superior quality. Despite being potentially widely applicable across species, this idea has received little attention. In a laboratory experiment with yellow dung flies ( Scathophaga stercoraria ) we found that males that were more successful in sperm competition also had offspring that developed faster. There was no relationship between paternal success in sperm competition and the ability of offspring to survive post-emergence starvation. Since faster development times are likely to be advantageous in this species, our data provide some support for polyandry evolving as a means of producing higher-quality offspring via sperm competition

    Ad Hoc Microphone Array Calibration: Euclidean Distance Matrix Completion Algorithm and Theoretical Guarantees

    Get PDF
    This paper addresses the problem of ad hoc microphone array calibration where only partial information about the distances between microphones is available. We construct a matrix consisting of the pairwise distances and propose to estimate the missing entries based on a novel Euclidean distance matrix completion algorithm by alternative low-rank matrix completion and projection onto the Euclidean distance space. This approach confines the recovered matrix to the EDM cone at each iteration of the matrix completion algorithm. The theoretical guarantees of the calibration performance are obtained considering the random and locally structured missing entries as well as the measurement noise on the known distances. This study elucidates the links between the calibration error and the number of microphones along with the noise level and the ratio of missing distances. Thorough experiments on real data recordings and simulated setups are conducted to demonstrate these theoretical insights. A significant improvement is achieved by the proposed Euclidean distance matrix completion algorithm over the state-of-the-art techniques for ad hoc microphone array calibration.Comment: In Press, available online, August 1, 2014. http://www.sciencedirect.com/science/article/pii/S0165168414003508, Signal Processing, 201

    Surrogate Gradient Spiking Neural Networks as Encoders for Large Vocabulary Continuous Speech Recognition

    Full text link
    Compared to conventional artificial neurons that produce dense and real-valued responses, biologically-inspired spiking neurons transmit sparse and binary information, which can also lead to energy-efficient implementations. Recent research has shown that spiking neural networks can be trained like standard recurrent neural networks using the surrogate gradient method. They have shown promising results on speech command recognition tasks. Using the same technique, we show that they are scalable to large vocabulary continuous speech recognition, where they are capable of replacing LSTMs in the encoder with only minor loss of performance. This suggests that they may be applicable to more involved sequence-to-sequence tasks. Moreover, in contrast to their recurrent non-spiking counterparts, they show robustness to exploding gradient problems without the need to use gates

    A t-distribution based operator for enhancing out of distribution robustness of neural network classifiers

    Full text link
    Neural Network (NN) classifiers can assign extreme probabilities to samples that have not appeared during training (out-of-distribution samples) resulting in erroneous and unreliable predictions. One of the causes for this unwanted behaviour lies in the use of the standard softmax operator which pushes the posterior probabilities to be either zero or unity hence failing to model uncertainty. The statistical derivation of the softmax operator relies on the assumption that the distributions of the latent variables for a given class are Gaussian with known variance. However, it is possible to use different assumptions in the same derivation and attain from other families of distributions as well. This allows derivation of novel operators with more favourable properties. Here, a novel operator is proposed that is derived using tt-distributions which are capable of providing a better description of uncertainty. It is shown that classifiers that adopt this novel operator can be more robust to out of distribution samples, often outperforming NNs that use the standard softmax operator. These enhancements can be reached with minimal changes to the NN architecture.Comment: 5 pages, 5 figures, to be published in IEEE Signal Processing Letters, reproducible code https://github.com/idiap/tsoftma

    Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding

    Full text link
    Recently, large pretrained language models have demonstrated strong language understanding capabilities. This is particularly reflected in their zero-shot and in-context learning abilities on downstream tasks through prompting. To assess their impact on spoken language understanding (SLU), we evaluate several such models like ChatGPT and OPT of different sizes on multiple benchmarks. We verify the emergent ability unique to the largest models as they can reach intent classification accuracy close to that of supervised models with zero or few shots on various languages given oracle transcripts. By contrast, the results for smaller models fitting a single GPU fall far behind. We note that the error cases often arise from the annotation scheme of the dataset; responses from ChatGPT are still reasonable. We show, however, that the model is worse at slot filling, and its performance is sensitive to ASR errors, suggesting serious challenges for the application of those textual models on SLU.Comment: 6 pages, 2 figures; Accepted by Interspeech 202
    • …
    corecore