80 research outputs found

    The Pronunciation Accuracy of Interactive Dialog System for Malaysian Primary School Students

    Get PDF
    This project is to examine the accuracy of using existing speech recognition engine in interactive dialog system for English as second language (ESL) Malaysian primary school student in literacy education. Students are interested to learn literacy using computer that encompasses spoken dialog as it motivates students to be more confidence in reading and pronunciation without depending solely on teachers. This computer assisted learning will improve student’s oral reading ability by using the speech recognition in IDS. By using the system students are able to learn, to read and pronounce a word correctly independently without seeking help from teachers. This study is conducted at Sungai Berembang Primary School involving all 16 female and 18 male standard 2 students aged 8 years old. These students possess various reading pronunciation, abilities, and experience in English language with Malay language as their first language. The main objective of this studyis to examine the accuracy of using an existing speech recognition engine for ESL Malaysian students in literacy education. The specific objectives of this study are to identify requirement and evaluate speech recognition based dialog system for reading accuracy. This kind of speech recognition technology is aiming to provide teacher-similar tutoring ability in children’s phonemic awareness, vocabulary building, word comprehension, and fluent reading.This method has five stages. This method enables to construct a framework. Develop system architecture then analyze and design the system. It also builds the prototype for the system upon the system implementation which will be used in this study is the System Development Research Method.Lastly its observe, test the system and the results of the study and implementation of IDS students found 85% of this has helped the English language after using this system

    Human Factors of Integrating Speech and Manual Input Devices: The Case of Computer Aided Design

    Get PDF
    The thesis investigates integrating the use of speech input and manual input devices in human-computer systems. The domain of computer aided design (CAD) is used as a case study. A methodology for empirical evaluation of CAD systems is presented. The methodology is based on a framework that describes the input/output processes presumed to underlie performance in design activities, using behaviour protocols and performance indices as data. For modelling system behaviour, a framework derived from the Blackboard architecture of design is described. The framework employs knowledge sources to represent different behaviour types recruited during CAD performance. Variability in user behaviour throughout the investigation is explained with reference to the model. The problems that expert CAD users experience in using manual input devices are first documented in an observational study conducted at their workplace. This demonstrates that the unitary use of manual input resulted in non-optimal behaviour. Possible solutions to these problems, using speech input for some command and data entry tasks, are explored in three experiments. In each experiment, a comparative analysis of alternative systems is made using data obtained from naive and novice users. In Experiment 1, the use of speech as a unitary solution to the problems of manual input was also found to result in non-optimal behaviour and performance. The solution explored in Experiment 2 was to allocate some commands and alphanumeric data to each input device, using the frequency of use principle. This approach, however, entailed the additional problem of remembering which device to use. Experiment 3 evaluated the separate allocation of commands to speech input and numeric plus graphical data to manual input. Additionally, performance aids and feedback facilities were provided to users. This clear-cut assignment of device to task characteristics and the use of such aids led to an enhancement in speech performance, in addition to improving behaviour. The findings from this research are used to develop guidelines for an integrated CAD system involving speech and manual input. The guidelines, which are intended for use by end users, CAD implementors and system designers, were validated in the workplace by the latter. Lastly, the thesis contextualises the research within an ergonomics framework, mapping the research development from problem specification to application and synthesis. Problems with the investigation are also discussed, and suggestions made as to how these might be resolved

    Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme

    Get PDF
    Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie

    Improving Dysarthric Speech Recognition by Enriching Training Datasets

    Get PDF
    Dysarthria is a motor speech disorder that results from disruptions in the neuro-motor interface and is characterised by poor articulation of phonemes and hyper-nasality and is characteristically different from normal speech. Many modern automatic speech recognition systems focus on a narrow range of speech diversity therefore as a consequence of this they exclude a groups of speakers who deviate in aspects of gender, race, age and speech impairment when building training datasets. This study attempts to develop an automatic speech recognition system that deals with dysarthric speech with limited dysarthric speech data. Speech utterances collected from the TORGO database are used to conduct experiments on a wav2vec2.0 model only trained on the Librispeech 960h dataset to obtain a baseline performance of the word error rate (WER) when recognising dysarthric speech. A version of the Librispeech model fine-tuned on multi-language datasets was tested to see if it would improve accuracy and achieved a top reduction of 24.15% in the WER for one of the male dysarthric speakers in the dataset. Transfer learning with speech recognition models and preprocessing dysarthric speech to improve its intelligibility by using general adversarial networks were limited in their potential due to a lack of dysarthric speech dataset of adequate size to use these technologies. The main conclusion drawn from this study is that a large diverse dysarthric speech dataset comparable to the size of datasets used to train machine learning ASR systems like Librispeech,with different types of speech, scripted and unscripted, is required to improve performance.

    Towards structured neural spoken dialogue modelling.

    Get PDF
    195 p.In this thesis, we try to alleviate some of the weaknesses of the current approaches to dialogue modelling,one of the most challenging areas of Artificial Intelligence. We target three different types of dialogues(open-domain, task-oriented and coaching sessions), and use mainly machine learning algorithms to traindialogue models. One challenge of open-domain chatbots is their lack of response variety, which can betackled using Generative Adversarial Networks (GANs). We present two methodological contributions inthis regard. On the one hand, we develop a method to circumvent the non-differentiability of textprocessingGANs. On the other hand, we extend the conventional task of discriminators, which oftenoperate at a single response level, to the batch level. Meanwhile, two crucial aspects of task-orientedsystems are their understanding capabilities because they need to correctly interpret what the user islooking for and their constraints), and the dialogue strategy. We propose a simple yet powerful way toimprove spoken understanding and adapt the dialogue strategy by explicitly processing the user's speechsignal through audio-processing transformer neural networks. Finally, coaching dialogues shareproperties of open-domain and task-oriented dialogues. They are somehow task-oriented but, there is norush to complete the task, and it is more important to calmly converse to make the users aware of theirown problems. In this context, we describe our collaboration in the EMPATHIC project, where a VirtualCoach capable of carrying out coaching dialogues about nutrition was built, using a modular SpokenDialogue System. Second, we model such dialogues with an end-to-end system based on TransferLearning

    The Language Machine

    Get PDF
    • …
    corecore