Search CORE

1,225 research outputs found

Perception and Acquisition of Natural Authentic English Speech for Chinese Learners Using DIT\u27s Speech Technologies

Author: Wang Yi
Publication venue: Dublin Institute of Technology
Publication date: 01/05/2010
Field of study

Given that Chinese language learners are greatly influenced by their mother-tongue, which is a tone language rather than an intonation language, learning and coping with authentic English speech seems more difficult than for learners of other languages. The focus of the current research is, on the basis of analysis of the nature of spoken English and spoken Chinese, to help Chinese learners derive benefit from ICT technologies developed by the Technological University Dublin (DIT). The thesis concentrates on investigating the application of speech technologies in bridging the gap between students’ internalised, idealised formulations and natural, authentic English speech. Part of the testing carried out by the present author demonstrates the acceptability of a slow-down algorithm in facilitating Chinese learners of English in re-producing formulaic language. This algorithm is useful because it can slow down audio files to any desired speed between 100% and 40% without distortion, so as to allow language learners to pay attention to the real, rapid flow of ‘messy’ speech and follow the intonation patterns contained in them. The rationale for and the application of natural, dialogic native-to-native English speech to language learning is also explored. The Chinese language learners involved in this study are exposed to authentic, native speech patterns by providing them access to real, informal dialogue in various contexts. In the course of this analysis, the influence of speed of delivery and pitch range on the categorisation of formulaic language is also investigated. The study investigates the potential of the speech tools available to the present author as an effective EFL learning facility, especially for speakers of tone languages, and their role in helping language learners achieve confluent interaction in an English L1 environment

Arrow@TUDublin

Recent Trends in Deep Learning Based Personality Detection

Author: Cambria Erik
Gelbukh Alexander
Majumder Navonil
Mehta Yash
Publication venue
Publication date: 27/08/2019
Field of study

Recently, the automatic prediction of personality traits has received a lot of attention. Specifically, personality trait prediction from multimodal data has emerged as a hot topic within the field of affective computing. In this paper, we review significant machine learning models which have been employed for personality detection, with an emphasis on deep learning-based methods. This review paper provides an overview of the most popular approaches to automated personality detection, various computational datasets, its industrial applications, and state-of-the-art machine learning models for personality detection with specific focus on multimodal approaches. Personality detection is a very broad and diverse topic: this survey only focuses on computational approaches and leaves out psychological studies on personality detection

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Methods for pronunciation assessment in computer aided language learning

Author: Peabody Mitchell A. (Mitchell Aaron)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 149-176).Learning a foreign language is a challenging endeavor that entails acquiring a wide range of new knowledge including words, grammar, gestures, sounds, etc. Mastering these skills all require extensive practice by the learner and opportunities may not always be available. Computer Aided Language Learning (CALL) systems provide non-threatening environments where foreign language skills can be practiced where ever and whenever a student desires. These systems often have several technologies to identify the different types of errors made by a student. This thesis focuses on the problem of identifying mispronunciations made by a foreign language student using a CALL system. We make several assumptions about the nature of the learning activity: it takes place using a dialogue system, it is a task- or game-oriented activity, the student should not be interrupted by the pronunciation feedback system, and that the goal of the feedback system is to identify severe mispronunciations with high reliability. Detecting mispronunciations requires a corpus of speech with human judgements of pronunciation quality. Typical approaches to collecting such a corpus use an expert phonetician to both phonetically transcribe and assign judgements of quality to each phone in a corpus. This is time consuming and expensive. It also places an extra burden on the transcriber. We describe a novel method for obtaining phone level judgements of pronunciation quality by utilizing non-expert, crowd-sourced, word level judgements of pronunciation. Foreign language learners typically exhibit high variation and pronunciation shapes distinct from native speakers that make analysis for mispronunciation difficult. We detail a simple, but effective method for transforming the vowel space of non-native speakers to make mispronunciation detection more robust and accurate. We show that this transformation not only enhances performance on a simple classification task, but also results in distributions that can be better exploited for mispronunciation detection. This transformation of the vowel is exploited to train a mispronunciation detector using a variety of features derived from acoustic model scores and vowel class distributions. We confirm that the transformation technique results in a more robust and accurate identification of mispronunciations than traditional acoustic models.by Mitchell A. Peabody.Ph.D

DSpace@MIT

EFL listening development through diagnosis: an assessment-based study of listening sub-skills using Rasch measurement

Author: Guan Yuanyuan
Publication venue
Publication date: 01/01/2019
Field of study

The lack of informed knowledge about listening subskills and their relationships has hindered the development of the diagnostic English language track assessment (DELTA) in three participating Hong Kong universities. This study investigates English as a foreign language (EFL) learners' listening proficiency development in understanding different spoken genres in the Hong Kong Chinese tertiary contexts. It aims to: i) identify the subskills and/or cognitive processes that underlie student performance on the DELTA listening component; ii) examine the difficulty levels of the DELTA listening subskills, and, consequentially, their hierarchical order; iii) investigate the impact of text type on difficulty level and the hierarchical order of the subskills; and iv) infer principles underlying the development of listening proficiency in the Hong Kong tertiary education contexts. A multi-method approach was employed for data collection and analysis. The primary quantitative data were derived from the DELTA listening component items answered by 2830 Chinese ELF learners who studied in their first or second year in the DELTA participating universities in the 2013-14 academic year. The item pool included 207 multiple-choice questions (MCQ) from 33 texts of three text types – conversation, interview and lecture. Each MCQ is intended to measure a particular listening subskill, including: 1) identifying specific information (SSK1); 2) understanding main idea and supporting ideas (SSK2); 3) understanding information and making an inference (SSK3); 4) interpreting a word or phrase as used by the speaker (SSK4); 5) inferring the attitude or intention of the speaker (SSK5); and 6) inferring the speaker's reasoning (SSK6). By adopting inter-related Rasch analyses using Winsteps and Facets, all test items were calibrated and analysed to determine their difficulty measures and their respective difficulties across the three text types. Qualitative Stimulated Recall Protocol (SRP) discussions were then conducted with 62 examinees of varying estimated listening abilities one month later, in a simulated test situation, where the test-taking process was video-recorded and the participants were asked to recall and to verbalise their thought processes and strategies they used to answer each question. The SRP results reveal an array of both cognitive processes and test-taking strategies in the listening comprehension and test-answering process. Firstly, various combinations of cognitive processes were utilised by both the high and low ability examinees to answer questions targeting the same listening sub-skill; however, the dominant cognitive process that was reported to have been used to answer each question corresponded with the particular listening subskill intended by DELTA item writers. Secondly, an array of test-taking strategies best identified as elimination, and guessing, were reported as used by examinees during the test. While this finding might not be surprising given the exam-oriented atmosphere prevailing in Hong Kong secondary school education, it alerted the researcher to scrutinise the validity of the DELTA listening component. The most striking observation from the listening test analysis is that, the DELTA listening subskills are measurably separable from each other, and a hierarchical pattern is established. In terms of their interaction with text type, the results showed that SSK1 and SSK6 were, respectively, the easiest and the most difficult subskills, whereas the hierarchical orders of the other four subskills varied across the three text types. More generally, these findings provide empirical evidence for the proposition that EFL listening comprehension is composed of multiple listening subskills, which operate interactively and interdependently in the listening process. The results regarding the difficulty level and the hierarchy of listening subskills corroborate the findings of prior research that low-level processing, such as identifying specific information, poses less challenge than high-level processing, such as summarising and inferencing. Because of the complexity in the interaction between text type and listening subskills, it is difficult to identify an overarching hierarchical order of the six listening subskills across the three text types. A general pattern, however, is that the difficulty increased from SSK1, SSK2 to SSK6 irrespective of the text type, and this corresponds to the general subskill hierarchy. The study will benefit teachers and students with diagnostic profiling and bridge the gap in diagnostic test design with targeted items of appropriate difficulty for predicting learners' listening development. It will extend second language acquisition theory with a hierarchical trajectory of listening proficiency growth. Limitations and future research recommendations are discussed

ResearchOnline at James Cook University

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)