1,152 research outputs found

    Encoding of phonology in a recurrent neural model of grounded speech

    Full text link
    We study the representation and encoding of phonemes in a recurrent neural network model of grounded speech. We use a model which processes images and their spoken descriptions, and projects the visual and auditory representations into the same semantic space. We perform a number of analyses on how information about individual phonemes is encoded in the MFCC features extracted from the speech signal, and the activations of the layers of the model. Via experiments with phoneme decoding and phoneme discrimination we show that phoneme representations are most salient in the lower layers of the model, where low-level signals are processed at a fine-grained level, although a large amount of phonological information is retain at the top recurrent layer. We further find out that the attention mechanism following the top recurrent layer significantly attenuates encoding of phonology and makes the utterance embeddings much more invariant to synonymy. Moreover, a hierarchical clustering of phoneme representations learned by the network shows an organizational structure of phonemes similar to those proposed in linguistics.Comment: Accepted at CoNLL 201

    Phonological Level wav2vec2-based Mispronunciation Detection and Diagnosis Method

    Full text link
    The automatic identification and analysis of pronunciation errors, known as Mispronunciation Detection and Diagnosis (MDD) plays a crucial role in Computer Aided Pronunciation Learning (CAPL) tools such as Second-Language (L2) learning or speech therapy applications. Existing MDD methods relying on analysing phonemes can only detect categorical errors of phonemes that have an adequate amount of training data to be modelled. With the unpredictable nature of the pronunciation errors of non-native or disordered speakers and the scarcity of training datasets, it is unfeasible to model all types of mispronunciations. Moreover, phoneme-level MDD approaches have a limited ability to provide detailed diagnostic information about the error made. In this paper, we propose a low-level MDD approach based on the detection of speech attribute features. Speech attribute features break down phoneme production into elementary components that are directly related to the articulatory system leading to more formative feedback to the learner. We further propose a multi-label variant of the Connectionist Temporal Classification (CTC) approach to jointly model the non-mutually exclusive speech attributes using a single model. The pre-trained wav2vec2 model was employed as a core model for the speech attribute detector. The proposed method was applied to L2 speech corpora collected from English learners from different native languages. The proposed speech attribute MDD method was further compared to the traditional phoneme-level MDD and achieved a significantly lower False Acceptance Rate (FAR), False Rejection Rate (FRR), and Diagnostic Error Rate (DER) over all speech attributes compared to the phoneme-level equivalent

    Speech and neural network dynamics

    Get PDF

    Computer simulations of developmental change: The contributions of working memory capacity and long-term knowledge

    Get PDF
    Increasing working memory (WM) capacity is often cited as a major influence on children’s development and yet WM capacity is difficult to examine independently of long-term knowledge. A computational model of children’s nonword repetition (NWR) performance is presented that independently manipulates long-term knowledge and WM capacity to determine the relative contributions of each in explaining the developmental data. The simulations show that (1) both mechanisms independently cause the same overall developmental changes in NWR performance; (2) increase in long-term knowledge provides the better fit to the child data; and (3) varying both long-term knowledge and WM capacity adds no significant gains over varying long-term knowledge alone. Given that increases in long-term knowledge must occur during development, the results indicate that increases in WM capacity may not be required to explain developmental differences. An increase in WM capacity should only be cited as a mechanism of developmental change when there are clear empirical reasons for doing so
    corecore