117 research outputs found
Analyzing analytical methods: The case of phonology in neural models of spoken language
Given the fast development of analysis techniques for NLP and speech
processing systems, few systematic studies have been conducted to compare the
strengths and weaknesses of each method. As a step in this direction we study
the case of representations of phonology in neural network models of spoken
language. We use two commonly applied analytical techniques, diagnostic
classifiers and representational similarity analysis, to quantify to what
extent neural activation patterns encode phonemes and phoneme sequences. We
manipulate two factors that can affect the outcome of analysis. First, we
investigate the role of learning by comparing neural activations extracted from
trained versus randomly-initialized models. Second, we examine the temporal
scope of the activations by probing both local activations corresponding to a
few milliseconds of the speech signal, and global activations pooled over the
whole utterance. We conclude that reporting analysis results with randomly
initialized models is crucial, and that global-scope methods tend to yield more
consistent results and we recommend their use as a complement to local-scope
diagnostic methods.Comment: ACL 202
Spoken command recognition for robotics
In this thesis, I investigate spoken command recognition technology for robotics. While high
robustness is expected, the distant and noisy conditions in which the system has to operate
make the task very challenging. Unlike commercial systems which all rely on a "wake-up"
word to initiate the interaction, the pipeline proposed here directly detect and recognizes
commands from the continuous audio stream. In order to keep the task manageable despite
low-resource conditions, I propose to focus on a limited set of commands, thus trading off
flexibility of the system against robustness.
Domain and speaker adaptation strategies based on a multi-task regularization paradigm
are first explored. More precisely, two different methods are proposed which rely on a tied
loss function which penalizes the distance between the output of several networks. The first
method considers each speaker or domain as a task. A canonical task-independent network is
jointly trained with task-dependent models, allowing both types of networks to improve by
learning from one another. While an improvement of 3.2% on the frame error rate (FER) of
the task-independent network is obtained, this only partially carried over to the phone error
rate (PER), with 1.5% of improvement. Similarly, a second method explored the parallel
training of the canonical network with a privileged model having access to i-vectors. This
method proved less effective with only 1.2% of improvement on the FER.
In order to make the developed technology more accessible, I also investigated the use
of a sequence-to-sequence (S2S) architecture for command classification. The use of an
attention-based encoder-decoder model reduced the classification error by 40% relative to a
strong convolutional neural network (CNN)-hidden Markov model (HMM) baseline, showing
the relevance of S2S architectures in such context. In order to improve the flexibility of the
trained system, I also explored strategies for few-shot learning, which allow to extend the
set of commands with minimum requirements in terms of data. Retraining a model on the
combination of original and new commands, I managed to achieve 40.5% of accuracy on the
new commands with only 10 examples for each of them. This scores goes up to 81.5% of
accuracy with a larger set of 100 examples per new command. An alternative strategy, based
on model adaptation achieved even better scores, with 68.8% and 88.4% of accuracy with 10
and 100 examples respectively, while being faster to train. This high performance is obtained
at the expense of the original categories though, on which the accuracy deteriorated. Those
results are very promising as the methods allow to easily extend an existing S2S model with
minimal resources.
Finally, a full spoken command recognition system (named iCubrec) has been developed
for the iCub platform. The pipeline relies on a voice activity detection (VAD) system to
propose a fully hand-free experience. By segmenting only regions that are likely to contain
commands, the VAD module also allows to reduce greatly the computational cost of the
pipeline. Command candidates are then passed to the deep neural network (DNN)-HMM
command recognition system for transcription. The VoCub dataset has been specifically
gathered to train a DNN-based acoustic model for our task. Through multi-condition training
with the CHiME4 dataset, an accuracy of 94.5% is reached on VoCub test set. A filler model,
complemented by a rejection mechanism based on a confidence score, is finally added to the
system to reject non-command speech in a live demonstration of the system
Speech Recognition for the iCub Platform
This paper describes open source software (available at https://github.com/robotology/natural- speech) to build automatic speech recognition (ASR) systems and run them within the YARP platform. The toolkit is designed (i) to allow non-ASR experts to easily create their own ASR system and run it on iCub, and (ii) to build deep learning-based models specifically addressing the main challenges an ASR system faces in the context of verbal human-iCub interactions. The toolkit mostly consists of Python, C++ code and shell scripts integrated in YARP. As additional contribution, a second codebase (written in Matlab) is provided for more expert ASR users who want to experiment with bio-inspired and developmental learning-inspired ASR systems. Specifically, we provide code for two distinct kinds of speech recognition: "articulatory" and "unsupervised" speech recognition. The first is largely inspired by influential neurobiological theories of speech perception which assume speech perception to be mediated by brain motor cortex activities. Our articulatory systems have been shown to outperform strong deep learning- based baselines. The second type of recognition systems, the "unsupervised" systems, do not use any supervised information (contrary to most ASR systems, including our articulatory systems). To some extent, they mimic an infant who has to discover the basic speech units of a language by herself. In addition, we provide resources consisting of pre-trained deep learning models for ASR, and a 2,5-hours speech dataset of spoken commands, the VoCub dataset, which can be used to adapt an ASR system to the typical acoustic environments in which iCub operates
Digital terrain analysis of the Haute-Mentue catchment an scale effect for hydrological modelling with TOPMODEL
International audienceIt is widely recognised that topography plays an important role in the generation of runoff. The scale of a digital elevation model has been found to have some impacts on the results of hydrological modelling in several studies. In particular it has been shown that the representation of the statistical distribution of the topographic index used by TOPMODEL is sensitive to the scale of the digital terrain model. The objectives of this study are to develop an analysis of the topography and scale effects for the Haute-Mentue catchment and to test the role of different spatial resolution on parameter calibration. The major result is that the spatial scale is important for the parameter values, but not determinant for the modelling results if a pertinent methodology is adopted for the determination of digital watershed representation. Keywords: digital elevation model, topographic index, scale problems, TOPMODEL</p
Stitching proteins into membranes, not sew simple
Most integral membrane proteins located within the endomembrane system of eukaryotic cells are first assembled co-translationally into the endoplasmic reticulum (ER) before being sorted and trafficked to other organelles. The assembly of membrane proteins is mediated by the ER translocon, which allows passage of lumenal domains through and lateral integration of transmembrane (TM) domains into the ER membrane. It may be convenient to imagine multi-TM domain containing membrane proteins being assembled by inserting their first TM domain in the correct orientation, with subsequent TM domains inserting with alternating orientations. However a simple threading model of assembly, with sequential insertion of one TM domain into the membrane after another, does not universally stand up to scrutiny. In this article we review some of the literature illustrating the complexities of membrane protein assembly. We also present our own thoughts on aspects that we feel are poorly understood. In short we hope to convince the readers that threading of membrane proteins into membranes is 'not sew simple' and a topic that requires further investigation
A Single Polar Residue and Distinct Membrane Topologies Impact the Function of the Infectious Bronchitis Coronavirus E Protein
The coronavirus E protein is a small membrane protein with a single predicted hydrophobic domain (HD), and has a poorly defined role in infection. The E protein is thought to promote virion assembly, which occurs in the Golgi region of infected cells. It has also been implicated in the release of infectious particles after budding. The E protein has ion channel activity in vitro, although a role for channel activity in infection has not been established. Furthermore, the membrane topology of the E protein is of considerable debate, and the protein may adopt more than one topology during infection. We previously showed that the HD of the infectious bronchitis virus (IBV) E protein is required for the efficient release of infectious virus, an activity that correlated with disruption of the secretory pathway. Here we report that a single residue within the hydrophobic domain, Thr16, is required for secretory pathway disruption. Substitutions of other residues for Thr16 were not tolerated. Mutations of Thr16 did not impact virus assembly as judged by virus-like particle production, suggesting that alteration of secretory pathway and assembly are independent activities. We also examined how the membrane topology of IBV E affected its function by generating mutant versions that adopted either a transmembrane or membrane hairpin topology. We found that a transmembrane topology was required for disrupting the secretory pathway, but was less efficient for virus-like particle production. The hairpin version of E was unable to disrupt the secretory pathway or produce particles. The findings reported here identify properties of the E protein that are important for its function, and provide insight into how the E protein may perform multiple roles during infection
Recommended from our members
Repair or destruction: an intimate liaison between ubiquitin ligases and molecular chaperones in proteostasis
Cellular differentiation, developmental processes, and environmental factors challenge the integrity of the proteome in every eukaryotic cell. The maintenance of protein homeostasis, or proteostasis, involves folding and degradation of damaged proteins, and is essential for cellular function, organismal growth, and viability [1, 2]. Misfolded proteins that cannot be refolded by chaperone machineries are degraded by specialized proteolytic systems. A major degradation pathway regulating cellular proteostasis is the ubiquitin/proteasome-system (UPS), which regulates turnover of damaged proteins that accumulate upon stress and during aging. Despite the large number of structurally unrelated substrates, ubiquitin conjugation is remarkably selective. Substrate selectivity is mainly provided by the group of E3 enzymes. Several observations indicate that numerous E3 ubiquitin ligases intimately collaborate with molecular chaperones to maintain the cellular proteome. In this Review, we provide an overview of specialized quality control E3 ligases playing a critical role in the degradation of damaged proteins. The process of substrate recognition and turnover, the type of chaperones they team up with, and the potential pathogeneses associated with their malfunction will be further discusse
- …