114 research outputs found

    How Phonotactics Affect Multilingual and Zero-shot ASR Performance

    Full text link
    The idea of combining multiple languages' recordings to train a single automatic speech recognition (ASR) model brings the promise of the emergence of universal speech representation. Recently, a Transformer encoder-decoder model has been shown to leverage multilingual data well in IPA transcriptions of languages presented during training. However, the representations it learned were not successful in zero-shot transfer to unseen languages. Because that model lacks an explicit factorization of the acoustic model (AM) and language model (LM), it is unclear to what degree the performance suffered from differences in pronunciation or the mismatch in phonotactics. To gain more insight into the factors limiting zero-shot ASR transfer, we replace the encoder-decoder with a hybrid ASR system consisting of a separate AM and LM. Then, we perform an extensive evaluation of monolingual, multilingual, and crosslingual (zero-shot) acoustic and language models on a set of 13 phonetically diverse languages. We show that the gain from modeling crosslingual phonotactics is limited, and imposing a too strong model can hurt the zero-shot transfer. Furthermore, we find that a multilingual LM hurts a multilingual ASR system's performance, and retaining only the target language's phonotactic data in LM training is preferable.Comment: Accepted for publication in IEEE ICASSP 2021. The first 2 authors contributed equally to this wor

    Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme

    Get PDF
    Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie

    Computer analysis of children's non-native English speech for language learning and assessment

    Get PDF
    Children's ASR appears to be more challenging than adults' and it's even more difficult when it comes to non-native children's speech. This research investigates different techniques to compensate for the effects of non-native and children on the performance of ASR systems. The study mainly utilises hybrid DNN-HMM systems with conventional DNNs, LSTMs and more advanced TDNN models. This work uses the CALL-ST corpus and TLT-school corpus to study children's non-native English speech. Initially, data augmentation was explored on the CALL-ST corpus to address the lack of data problem using the AMI corpus and PF-STAR German corpus. Feature selection, acoustic model adaptation and selection were also investigated on CALL-ST. More aspects of the ASR system, including pronunciation modelling, acoustic modelling, language modelling and system fusion, were explored on the TLT-school corpus as this corpus has a bigger amount of data. Then, the relationships between the CALL-ST and TLT-school corpora were studied and utilised to improve ASR performance. The other part of the present work is text processing for non-native children's English speech. We focused on providing accept/reject feedback to learners based on the text generated by the ASR system from learners' spoken responses. A rule-based and a machine learning-based system were proposed for making the judgement, several aspects of the systems were evaluated. The influence of the ASR system on the text processing system was explored

    Neuropsychological Studies of Reading and Writing

    Get PDF
    This thesis investigates the reading and writing of two patients with brain injuries due to cerebro-vascular accidents. Background tests show both patients to be moderately anomic and to have severe impairments in reading and writing nonwords. Investigations of the locus of impairment in AN's nonword reading showed her to have normal orthographic analysis capabilities but impairments in converting single and multiple graphemes into phonemes and in phonemic blending. The central issue studied was the role of lexical but non-semantic processes in reading aloud, writing to dictation and copying. For this purpose a "familiar nonword" paradigm was developed in which the patients learned to read or write a small set of nonwords either with or without any associated semantics. Both AN and AM were able to learn to read nonwords to which no meanings were attached but they could still not read novel nonwords. Both patients were unable to report any meanings for the familiar nonwords when they read them and there was no evidence that learning to read them improved their sub-lexical processing abilities. These results are evidence for a direct lexical route from print to sound that is dedicated to processing whole familiar words. It was also shown with AN that if nonwords are given meanings then learning is faster than if they are not given meanings. Experiments designed to test the hypothesis that nonwords are read by analogy to words found no support for it. Both patients have severe impairments in writing novel nonwords to dictation. As they can repeat spoken nonwords after they have failed to write them, this is not due to a short-term memory impairment. Despite their nonword writing impairments, both patients were able to write to dictation the meaningless nonwords that they had previously learned to read at the first attempt, and AN did so one month after learning to read them. Neither patient however, could write novel nonwords made by reordering the letters of the familiar nonwords. Furthermore, the familiar nonwords used spellings that are of a priori low probability. The familiar nonwords must therefore have been written using lexical knowledge. Tests of semantic association showed that the familiar nonwords evoked no semantic information that the patients could report. Function words dictated to AN evoked little semantic information but she wrote them to dictation significantly better than nonwords made by reordering their letters. These results are evidence for a direct lexical route for writing to dictation. Copying was studied both with and without a five second delay between presentation and response. AN was better at delayed copying of meaningless but familiar nonwords than she was at copying novel nonwords. She was also better at delayed copying of six-letter, bi-syllabic nonwords that she had been trained to copy than she was at copying novel nonwords made by recombining the first and second halves of the familiar nonwords such that these halves retained their positions from the parent nonwords. AN was better at copying function words than nonwords made by reordering their letters. She was also better at copying function words than she was at reading or writing them to dictation. These results are evidence for a direct lexical route for copying. AN and AM were both able to write to dictation nonwords that they had never heard or written before but with which they had been made visually familiar during a visual discrimination task. They must have used lexical knowledge to do so because the spellings used were of a priori very low probability. The creation of lexical orthographic information which can be retrieved from novel auditory input raises difficulties for current models and various possible interpretations are discussed. Finally, some of the possible implications of the re-learning abilities shown by these patients, for rehabilitation procedures are discussed briefly
    • …
    corecore