1,152 research outputs found
Encoding of phonology in a recurrent neural model of grounded speech
We study the representation and encoding of phonemes in a recurrent neural
network model of grounded speech. We use a model which processes images and
their spoken descriptions, and projects the visual and auditory representations
into the same semantic space. We perform a number of analyses on how
information about individual phonemes is encoded in the MFCC features extracted
from the speech signal, and the activations of the layers of the model. Via
experiments with phoneme decoding and phoneme discrimination we show that
phoneme representations are most salient in the lower layers of the model,
where low-level signals are processed at a fine-grained level, although a large
amount of phonological information is retain at the top recurrent layer. We
further find out that the attention mechanism following the top recurrent layer
significantly attenuates encoding of phonology and makes the utterance
embeddings much more invariant to synonymy. Moreover, a hierarchical clustering
of phoneme representations learned by the network shows an organizational
structure of phonemes similar to those proposed in linguistics.Comment: Accepted at CoNLL 201
Recommended from our members
Modelling vocabulary acquisition: an explanation of the link between the phonological loop and long-term memory
Phonological Level wav2vec2-based Mispronunciation Detection and Diagnosis Method
The automatic identification and analysis of pronunciation errors, known as
Mispronunciation Detection and Diagnosis (MDD) plays a crucial role in Computer
Aided Pronunciation Learning (CAPL) tools such as Second-Language (L2) learning
or speech therapy applications. Existing MDD methods relying on analysing
phonemes can only detect categorical errors of phonemes that have an adequate
amount of training data to be modelled. With the unpredictable nature of the
pronunciation errors of non-native or disordered speakers and the scarcity of
training datasets, it is unfeasible to model all types of mispronunciations.
Moreover, phoneme-level MDD approaches have a limited ability to provide
detailed diagnostic information about the error made. In this paper, we propose
a low-level MDD approach based on the detection of speech attribute features.
Speech attribute features break down phoneme production into elementary
components that are directly related to the articulatory system leading to more
formative feedback to the learner. We further propose a multi-label variant of
the Connectionist Temporal Classification (CTC) approach to jointly model the
non-mutually exclusive speech attributes using a single model. The pre-trained
wav2vec2 model was employed as a core model for the speech attribute detector.
The proposed method was applied to L2 speech corpora collected from English
learners from different native languages. The proposed speech attribute MDD
method was further compared to the traditional phoneme-level MDD and achieved a
significantly lower False Acceptance Rate (FAR), False Rejection Rate (FRR),
and Diagnostic Error Rate (DER) over all speech attributes compared to the
phoneme-level equivalent
Computer simulations of developmental change: The contributions of working memory capacity and long-term knowledge
Increasing working memory (WM) capacity is often cited as a major influence on children’s development and yet WM capacity is difficult to examine independently of long-term knowledge. A computational model of children’s nonword repetition (NWR) performance is presented that independently manipulates long-term knowledge and WM capacity to determine the relative contributions of each in explaining the developmental data. The simulations show that (1) both mechanisms independently cause the same overall developmental changes in NWR performance; (2) increase in long-term knowledge provides the better fit to the child data; and (3) varying both long-term knowledge and WM capacity adds no significant gains over varying long-term knowledge alone. Given that increases in long-term knowledge must occur during development, the results indicate that increases in WM capacity may not be required to explain developmental differences. An increase in WM capacity should only be cited as a mechanism of developmental change when there are clear empirical reasons for doing so
- …