117 research outputs found

    Analyzing analytical methods: The case of phonology in neural models of spoken language

    Full text link
    Given the fast development of analysis techniques for NLP and speech processing systems, few systematic studies have been conducted to compare the strengths and weaknesses of each method. As a step in this direction we study the case of representations of phonology in neural network models of spoken language. We use two commonly applied analytical techniques, diagnostic classifiers and representational similarity analysis, to quantify to what extent neural activation patterns encode phonemes and phoneme sequences. We manipulate two factors that can affect the outcome of analysis. First, we investigate the role of learning by comparing neural activations extracted from trained versus randomly-initialized models. Second, we examine the temporal scope of the activations by probing both local activations corresponding to a few milliseconds of the speech signal, and global activations pooled over the whole utterance. We conclude that reporting analysis results with randomly initialized models is crucial, and that global-scope methods tend to yield more consistent results and we recommend their use as a complement to local-scope diagnostic methods.Comment: ACL 202

    Spoken command recognition for robotics

    Get PDF
    In this thesis, I investigate spoken command recognition technology for robotics. While high robustness is expected, the distant and noisy conditions in which the system has to operate make the task very challenging. Unlike commercial systems which all rely on a "wake-up" word to initiate the interaction, the pipeline proposed here directly detect and recognizes commands from the continuous audio stream. In order to keep the task manageable despite low-resource conditions, I propose to focus on a limited set of commands, thus trading off flexibility of the system against robustness. Domain and speaker adaptation strategies based on a multi-task regularization paradigm are first explored. More precisely, two different methods are proposed which rely on a tied loss function which penalizes the distance between the output of several networks. The first method considers each speaker or domain as a task. A canonical task-independent network is jointly trained with task-dependent models, allowing both types of networks to improve by learning from one another. While an improvement of 3.2% on the frame error rate (FER) of the task-independent network is obtained, this only partially carried over to the phone error rate (PER), with 1.5% of improvement. Similarly, a second method explored the parallel training of the canonical network with a privileged model having access to i-vectors. This method proved less effective with only 1.2% of improvement on the FER. In order to make the developed technology more accessible, I also investigated the use of a sequence-to-sequence (S2S) architecture for command classification. The use of an attention-based encoder-decoder model reduced the classification error by 40% relative to a strong convolutional neural network (CNN)-hidden Markov model (HMM) baseline, showing the relevance of S2S architectures in such context. In order to improve the flexibility of the trained system, I also explored strategies for few-shot learning, which allow to extend the set of commands with minimum requirements in terms of data. Retraining a model on the combination of original and new commands, I managed to achieve 40.5% of accuracy on the new commands with only 10 examples for each of them. This scores goes up to 81.5% of accuracy with a larger set of 100 examples per new command. An alternative strategy, based on model adaptation achieved even better scores, with 68.8% and 88.4% of accuracy with 10 and 100 examples respectively, while being faster to train. This high performance is obtained at the expense of the original categories though, on which the accuracy deteriorated. Those results are very promising as the methods allow to easily extend an existing S2S model with minimal resources. Finally, a full spoken command recognition system (named iCubrec) has been developed for the iCub platform. The pipeline relies on a voice activity detection (VAD) system to propose a fully hand-free experience. By segmenting only regions that are likely to contain commands, the VAD module also allows to reduce greatly the computational cost of the pipeline. Command candidates are then passed to the deep neural network (DNN)-HMM command recognition system for transcription. The VoCub dataset has been specifically gathered to train a DNN-based acoustic model for our task. Through multi-condition training with the CHiME4 dataset, an accuracy of 94.5% is reached on VoCub test set. A filler model, complemented by a rejection mechanism based on a confidence score, is finally added to the system to reject non-command speech in a live demonstration of the system

    Speech Recognition for the iCub Platform

    Get PDF
    This paper describes open source software (available at https://github.com/robotology/natural- speech) to build automatic speech recognition (ASR) systems and run them within the YARP platform. The toolkit is designed (i) to allow non-ASR experts to easily create their own ASR system and run it on iCub, and (ii) to build deep learning-based models specifically addressing the main challenges an ASR system faces in the context of verbal human-iCub interactions. The toolkit mostly consists of Python, C++ code and shell scripts integrated in YARP. As additional contribution, a second codebase (written in Matlab) is provided for more expert ASR users who want to experiment with bio-inspired and developmental learning-inspired ASR systems. Specifically, we provide code for two distinct kinds of speech recognition: "articulatory" and "unsupervised" speech recognition. The first is largely inspired by influential neurobiological theories of speech perception which assume speech perception to be mediated by brain motor cortex activities. Our articulatory systems have been shown to outperform strong deep learning- based baselines. The second type of recognition systems, the "unsupervised" systems, do not use any supervised information (contrary to most ASR systems, including our articulatory systems). To some extent, they mimic an infant who has to discover the basic speech units of a language by herself. In addition, we provide resources consisting of pre-trained deep learning models for ASR, and a 2,5-hours speech dataset of spoken commands, the VoCub dataset, which can be used to adapt an ASR system to the typical acoustic environments in which iCub operates

    Digital terrain analysis of the Haute-Mentue catchment an scale effect for hydrological modelling with TOPMODEL

    No full text
    International audienceIt is widely recognised that topography plays an important role in the generation of runoff. The scale of a digital elevation model has been found to have some impacts on the results of hydrological modelling in several studies. In particular it has been shown that the representation of the statistical distribution of the topographic index used by TOPMODEL is sensitive to the scale of the digital terrain model. The objectives of this study are to develop an analysis of the topography and scale effects for the Haute-Mentue catchment and to test the role of different spatial resolution on parameter calibration. The major result is that the spatial scale is important for the parameter values, but not determinant for the modelling results if a pertinent methodology is adopted for the determination of digital watershed representation. Keywords: digital elevation model, topographic index, scale problems, TOPMODEL</p

    Stitching proteins into membranes, not sew simple

    Get PDF
    Most integral membrane proteins located within the endomembrane system of eukaryotic cells are first assembled co-translationally into the endoplasmic reticulum (ER) before being sorted and trafficked to other organelles. The assembly of membrane proteins is mediated by the ER translocon, which allows passage of lumenal domains through and lateral integration of transmembrane (TM) domains into the ER membrane. It may be convenient to imagine multi-TM domain containing membrane proteins being assembled by inserting their first TM domain in the correct orientation, with subsequent TM domains inserting with alternating orientations. However a simple threading model of assembly, with sequential insertion of one TM domain into the membrane after another, does not universally stand up to scrutiny. In this article we review some of the literature illustrating the complexities of membrane protein assembly. We also present our own thoughts on aspects that we feel are poorly understood. In short we hope to convince the readers that threading of membrane proteins into membranes is 'not sew simple' and a topic that requires further investigation

    A Single Polar Residue and Distinct Membrane Topologies Impact the Function of the Infectious Bronchitis Coronavirus E Protein

    Get PDF
    The coronavirus E protein is a small membrane protein with a single predicted hydrophobic domain (HD), and has a poorly defined role in infection. The E protein is thought to promote virion assembly, which occurs in the Golgi region of infected cells. It has also been implicated in the release of infectious particles after budding. The E protein has ion channel activity in vitro, although a role for channel activity in infection has not been established. Furthermore, the membrane topology of the E protein is of considerable debate, and the protein may adopt more than one topology during infection. We previously showed that the HD of the infectious bronchitis virus (IBV) E protein is required for the efficient release of infectious virus, an activity that correlated with disruption of the secretory pathway. Here we report that a single residue within the hydrophobic domain, Thr16, is required for secretory pathway disruption. Substitutions of other residues for Thr16 were not tolerated. Mutations of Thr16 did not impact virus assembly as judged by virus-like particle production, suggesting that alteration of secretory pathway and assembly are independent activities. We also examined how the membrane topology of IBV E affected its function by generating mutant versions that adopted either a transmembrane or membrane hairpin topology. We found that a transmembrane topology was required for disrupting the secretory pathway, but was less efficient for virus-like particle production. The hairpin version of E was unable to disrupt the secretory pathway or produce particles. The findings reported here identify properties of the E protein that are important for its function, and provide insight into how the E protein may perform multiple roles during infection
    • …
    corecore