9,173 research outputs found

    Self-managed Speech Therapy

    Get PDF
    Speech defects are typically addressed by having the patient or learner undergo several sessions with speech therapists, who apply specialized therapeutic tools. Speech therapies tend to be expensive, require the scheduling of appointments, and do not lend themselves easily to self-paced self-improvement. This disclosure presents techniques that automatically provide speech-improvement feedback, thereby enabling self-managed speech therapy. Given a speech utterance by a user, the techniques cause display of a sequence of images of speech-organ positions, e.g., tongue, lips, throat muscles, etc., that correspond to the actual utterance as well as a targeted, ideal utterance. Further phonetic feedback is provided to the user using visual, tactile, spectrogram, or other modes, such that a speaker who is hard of learning can work towards a target pronunciation. The techniques also apply to foreign language learning

    I hear you eat and speak: automatic recognition of eating condition and food type, use-cases, and impact on ASR performance

    Get PDF
    We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient

    The use of scaffolding-based software in developing pronunciation

    Get PDF
    This research looked at the use of a scaffolding-based software in helping learners to develop pronunciation and fluency modelled on standard American English. The study used Vygotsky’s zone of proximal development (ZPD) theory and Scaffolding Learning. Principles as a basis for observing how learners of English progressed through the learning process. Firstly, the research examined an accent-reduction software to find out how the software design supports scaffolding principles. To determine the effectiveness of the software on learners’ general pronunciation, pre-test and post-test were used. The data obtained from the pre-test and the post-test showed a significant improvement in learners’ general pronunciation after using the pronunciation learning software.Secondly, case studies were conducted to investigate Persian ESL learners’ progress in pronouncing English consonants that are absent from the phonemic inventory of Persian. The selected cases were recorded during class time, while they were working with the software. The obtained recordings were then analysed using PRAAT, a speech analysis programme. Later, two raters helped the researcher to determine the quality of the sounds produced by the learners. The results from the case study showed that with the appropriate scaffolds provided by the software, in the form of explicit instruction, native models and multimodal feedback, the learners were found to have the microgenesis improvements towards the native model and progressed within the ZPD to pronounce the consonants that were absent from the inventory system of their first language. Finally, learners’ perceptions of the software were asked in an interview session after the instructional programme. Based on their responses to the interview questions, it was found that the learners positively perceived the use of the scaffolding-based accent reduction software to improve their general pronunciation

    Perceptions about Self-recording Videos to Develop EFL Speaking Skills in Two Ecuadorian Universities

    Get PDF
    The present study explores the perceptions of EFL students from two Ecuadorian universities on the use of Self-Recording Videos (SRV) to develop speaking skills.  As students do not have the opportunity to talk in the target language outside their classes, the authors of the present study analyzed the participants’ viewpoints regarding SRV to improve their conversational abilities.  There is still limited research on the use of SRV for English speaking practice in a foreign country, so the researchers' purpose is to fill this gap in the literature to contribute to further studies on the topic.  The authors consider essential to acknowledge the positive aspects of using this technique from the learners’ perspectives.  For this purpose, participants were required to self-record a video related to the content of the class during the week and submit it to the Moodle platform

    Spontal-N: A Corpus of Interactional Spoken Norwegian

    Get PDF
    Spontal-N is a corpus of spontaneous, interactional Norwegian. To our knowledge, it is the first corpus of Norwegian in which the majority of speakers have spent significant parts of their lives in Sweden, and in which the recorded speech displays varying degrees of interference from Swedish. The corpus consists of studio quality audio- and video-recordings of four 30-minute free conversations between acquaintances, and a manual orthographic transcription of the entire material. On basis of the orthographic transcriptions, we automatically annotated approximately 50 percent of the material on the phoneme level, by means of a forced alignment between the acoustic signal and pronunciations listed in a dictionary. Approximately seven percent of the automatic transcription was manually corrected. Taking the manual correction as a gold standard, we evaluated several sources of pronunciation variants for the automatic transcription. Spontal-N is intended as a general purpose speech resource that is also suitable for investigating phonetic detail
    corecore