5 research outputs found

    Automatic transcription and phonetic labelling of dyslexic children's reading in Bahasa Melayu

    Get PDF
    Automatic speech recognition (ASR) is potentially helpful for children who suffer from dyslexia. Highly phonetically similar errors of dyslexic children‟s reading affect the accuracy of ASR. Thus, this study aims to evaluate acceptable accuracy of ASR using automatic transcription and phonetic labelling of dyslexic children‟s reading in BM. For that, three objectives have been set: first to produce manual transcription and phonetic labelling; second to construct automatic transcription and phonetic labelling using forced alignment; and third to compare between accuracy using automatic transcription and phonetic labelling and manual transcription and phonetic labelling. Therefore, to accomplish these goals methods have been used including manual speech labelling and segmentation, forced alignment, Hidden Markov Model (HMM) and Artificial Neural Network (ANN) for training, and for measure accuracy of ASR, Word Error Rate (WER) and False Alarm Rate (FAR) were used. A number of 585 speech files are used for manual transcription, forced alignment and training experiment. The recognition ASR engine using automatic transcription and phonetic labelling obtained optimum results is 76.04% with WER as low as 23.96% and FAR is 17.9%. These results are almost similar with ASR engine using manual transcription namely 76.26%, WER as low as 23.97% and FAR a 17.9%. As conclusion, the accuracy of automatic transcription and phonetic labelling is acceptable to use it for help dyslexic children learning using ASR in Bahasa Melayu (BM

    Comparison of forced-alignment speech recognition and humans for generating reference VAD

    Get PDF
    This present paper aims to answer the question whether forced-alignment speech recognition can be used as an alternative to humans in generating reference Voice Activity Detection (VAD) transcriptions. An investigation of the level of agreement between automatic/manual VAD transcriptions and the reference ones produced by a human expert was carried out. Thereafter, statistical analysis was employed on the automatically produced and the collected manual transcriptions. Experimental results confirmed that forced-alignment speech recognition can provide accurate and consistent VAD labels.caslpub4422pu

    Malay articulation system for early screening diagnostic using hidden markov model and genetic algorithm

    Get PDF
    Speech recognition is an important technology and can be used as a great aid for individuals with sight or hearing disabilities today. There are extensive research interest and development in this area for over the past decades. However, the prospect in Malaysia regarding the usage and exposure is still immature even though there is demand from the medical and healthcare sector. The aim of this research is to assess the quality and the impact of using computerized method for early screening of speech articulation disorder among Malaysian such as the omission, substitution, addition and distortion in their speech. In this study, the statistical probabilistic approach using Hidden Markov Model (HMM) has been adopted with newly designed Malay corpus for articulation disorder case following the SAMPA and IPA guidelines. Improvement is made at the front-end processing for feature vector selection by applying the silence region calibration algorithm for start and end point detection. The classifier had also been modified significantly by incorporating Viterbi search with Genetic Algorithm (GA) to obtain high accuracy in recognition result and for lexical unit classification. The results were evaluated by following National Institute of Standards and Technology (NIST) benchmarking. Based on the test, it shows that the recognition accuracy has been improved by 30% to 40% using Genetic Algorithm technique compared with conventional technique. A new corpus had been built with verification and justification from the medical expert in this study. In conclusion, computerized method for early screening can ease human effort in tackling speech disorders and the proposed Genetic Algorithm technique has been proven to improve the recognition performance in terms of search and classification task

    A Minimum Boundary Error Framework for Automatic Phonetic Segmentation

    No full text

    Stabilizing Forces in Acoustic Cultural Evolution: Comparing Humans and Birds

    Full text link
    Learned acoustic communication systems, like birdsong and spoken human language, can be described from two seemingly contradictory perspectives. On one hand, learned acoustic communication systems can be remarkably consistent. Substantive and descriptive generalizations can be made which hold for a majority of populations within a species. On the other hand, learned acoustic communication systems are often highly variable. The degree of variation is often so great that few, if any, substantive generalizations hold for all populations in a species. Within my dissertation, I explore the interplay of variation and uniformity in three vocal learning species: budgerigars (Melopsittacus undulatus), house finches (Haemorhous mexicanus), and humans (Homo sapiens). Budgerigars are well-known for their versatile mimicry skills, house finch song organization is uniform across populations, and human language has been described as the prime example of variability by some while others see only subtle variations of largely uniform system. For each of these species, I address several questions related to variability and uniformity: What is the typical range of variation? What are the limits of variation? How are those two issues related? And what mechanisms underlie variability and uniformity? In chapter 3, I investigate a potential domain of uniformity in budgerigar warble: the segment. Segments, units divided by acoustic transitions rather than silence, have been largely ignored in non-human animal communication. I find that budgerigars can achieve a high degree of complexity and variability by combining and arranging these small, more stereotyped units. Furthermore, I find that budgerigar segment organization is not only consistent across independent budgerigar populations but is consistent with patterns found in human language. In chapter 4, I investigate variability in house finch song. I present data showing that house finches learn sound patterns which are absent in wild house finch populations. These data suggest that cross-population variation in house finch song is narrower than what is permitted by the house finch song learning program. Finally, in chapter 5, I focus on human language, the most well-described communication system. Here, I research a sound pattern that is absent in the majority of known languages. I find that the rare pattern has independently developed at least six times. In every case, the historical pathway which led to the rare pattern was the same. The historical development in these six linguistic lineages suggests that the overall rarity of the sound pattern is the result of acoustic similarity. These data illuminate the evolutionary forces that give rise to, and limit, variation. The results of this dissertation have wide-ranging implications, from necessary revisions of linguistic theories, to understanding epigenetic interactions, to the application of evolutionary theory to complex behavior. While these projects within the dissertation are all different, evidence from all three projects support the following claims: (i) cross-population commonality is not evidence for what a species is able to learn; (ii) peripheral mechanisms have a strong influence in limiting cross-population variability; and (iii) high degrees of variation can emerge from uniform traits
    corecore