104,984 research outputs found

    SKOPE: A connectionist/symbolic architecture of spoken Korean processing

    Full text link
    Spoken language processing requires speech and natural language integration. Moreover, spoken Korean calls for unique processing methodology due to its linguistic characteristics. This paper presents SKOPE, a connectionist/symbolic spoken Korean processing engine, which emphasizes that: 1) connectionist and symbolic techniques must be selectively applied according to their relative strength and weakness, and 2) the linguistic characteristics of Korean must be fully considered for phoneme recognition, speech and language integration, and morphological/syntactic processing. The design and implementation of SKOPE demonstrates how connectionist/symbolic hybrid architectures can be constructed for spoken agglutinative language processing. Also SKOPE presents many novel ideas for speech and language processing. The phoneme recognition, morphological analysis, and syntactic analysis experiments show that SKOPE is a viable approach for the spoken Korean processing.Comment: 8 pages, latex, use aaai.sty & aaai.bst, bibfile: nlpsp.bib, to be presented at IJCAI95 workshops on new approaches to learning for natural language processin

    Developing a Distributed Java-based Speech Recognition Engine

    Get PDF
    The development of speech recognition engines has traditionally been the territory of low-level development languages such as C. Until recently Java may not have been considered a candidate language for the development of such a speech engine, due to its security restrictions which limited its sound processing features. The release of the Java Sound API as part of the Java Media Framework and the subsequent integration of the Sound API into the standard Java development kit provides the necessary sound processing tools to Java to perform speech recognition. This paper documents our development of a speech recognition engine using the Java programming language. We discuss the theory of speech recognition engines using stochastic techniques such as Hidden Markov Models that we employ in our Java based implementation of speech sigm;zl processing algorithms like Fast Fourier Transform and Mel Frequency Cepstral Coefficients. Furthermore we describe our design goal and implementation of a distributed speech engine component which provides a client server approach to speech recognition. The distributed architecture allows us to deliver speech recognition technology and applications to a range of low powered devices such as PDAs and mobile phones which otherwise may not have the requisite computing power onboard to perform speech recognition

    Design and development of automatic speech recognition system for Tamil language using CMU Sphinx 4

    Get PDF
    This paper presents a design and development of Speech Recognition System for Tamil language. This system is based on CMU Sphinx 4 open source speech recognition (ASR) engine developed by Carnegie Mellon University. This system should be adapted to speaker specific automatic, continuous speech. One of the main components of this system is a core Tamil speech recognition system that can be trained with field specific data. The target domain is the accent spoken by illiterate Tamil-speaker from Eastern area of Sri Lanka. The phonetically rich and balanced sentence text corpus were developed and recorded in conditional environment to set up speaker specific speech corpus. Using this speech corpus the system was trained and tested with speaker specific (testing with same word uttered by same person) and speaker independent data (testing with different word uttered by different person). The system currently gives a satisfactory peak performance of 39.5% Word Error Rate (WER) for speaker specific and unsatisfactory rate for speaker independent data, which is comparable with the best word error rates of most of the recognition systems for continuous speech available for any language

    Perancangan Aplikasi Permainan Let's Say Dengan Interaksi Pengenalan Ucapan

    Full text link
    Computer games has been widespread. Currently, almost all devices that have a screen can play the game with the experience of interactivity to entertain the public. The game does not always have a negative effect. Game application that has a learning function will be useful and easily accepted by society. This study was conducted to determine how computer games can be a fun medium of information and learning by making use of speech recognition interaction. The application development of this "Let's Say" game is done by using the method of multimedia development consisting stages of concept, design, material collection, implementation, testing, and distribution. The design of the application is done using UML modeling and ERD. Application development is done using Visual Basic.NET programming language, WPF system interface, and SQLite database. Application utilizes Windows Speech Recognition as a speech recognition engine. Functional testing of the application is done using black box methods. Applications testing to users is done through a questionnaire. The test results showed that the application works well according to functions that has been specified. Application testing to users indicate that the application is informative and easy to use with a view that is quite interesting. Tests also showed that the use of speech recognition has been going well and helpful

    The Pronunciation Accuracy of Interactive Dialog System for Malaysian Primary School Students

    Get PDF
    This project is to examine the accuracy of using existing speech recognition engine in interactive dialog system for English as second language (ESL) Malaysian primary school student in literacy education. Students are interested to learn literacy using computer that encompasses spoken dialog as it motivates students to be more confidence in reading and pronunciation without depending solely on teachers. This computer assisted learning will improve student’s oral reading ability by using the speech recognition in IDS. By using the system students are able to learn, to read and pronounce a word correctly independently without seeking help from teachers. This study is conducted at Sungai Berembang Primary School involving all 16 female and 18 male standard 2 students aged 8 years old. These students possess various reading pronunciation, abilities, and experience in English language with Malay language as their first language. The main objective of this studyis to examine the accuracy of using an existing speech recognition engine for ESL Malaysian students in literacy education. The specific objectives of this study are to identify requirement and evaluate speech recognition based dialog system for reading accuracy. This kind of speech recognition technology is aiming to provide teacher-similar tutoring ability in children’s phonemic awareness, vocabulary building, word comprehension, and fluent reading.This method has five stages. This method enables to construct a framework. Develop system architecture then analyze and design the system. It also builds the prototype for the system upon the system implementation which will be used in this study is the System Development Research Method.Lastly its observe, test the system and the results of the study and implementation of IDS students found 85% of this has helped the English language after using this system

    Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and Balanced Speech Corpus

    Get PDF
    This paper describes and proposes an efficient and effective framework for the design and development of a speaker-independent continuous automatic Arabic speech recognition system based on a phonetically rich and balanced speech corpus. The speech corpus contains a total of 415 sentences recorded by 40 (20 male and 20 female) Arabic native speakers from 11 different Arab countries representing the three major regions (Levant, Gulf, and Africa) in the Arab world. The proposed Arabic speech recognition system is based on the Carnegie Mellon University (CMU) Sphinx tools, and the Cambridge HTK tools were also used at some testing stages. The speech engine uses 3-emitting state Hidden Markov Models (HMM) for tri-phone based acoustic models. Based on experimental analysis of about 7 hours of training speech data, the acoustic model is best using continuous observation’s probability model of 16 Gaussian mixture distributions and the state distributions were tied to 500 senones. The language model contains both bi-grams and tri-grams. For similar speakers but different sentences, the system obtained a word recognition accuracy of 92.67% and 93.88% and a Word Error Rate (WER) of 11.27% and 10.07% with and without diacritical marks respectively. For different speakers with similar sentences, the system obtained a word recognition accuracy of 95.92% and 96.29% and a WER of 5.78% and 5.45% with and without diacritical marks respectively. Whereas different speakers and different sentences, the system obtained a word recognition accuracy of 89.08% and 90.23% and a WER of 15.59% and 14.44% with and without diacritical marks respectively

    Statistical parametric evaluation on new corpus design for Malay speech articulation disorder early diagnosis

    Get PDF
    Speech-to-Text or always been known as speech recognition plays an important role nowadays especially in medical area specifically in speech impairment. In this study, a Malay language speech-to-Text system was been designed by using Hidden Markov Model (HMM) as a statistical engine with emphasizing the way of Malay speech corpus design specifically for Malay articulation speech disorder. This study also describes and tests the correct number of state to analyze the changes in the performance of current Malay speech recognition in term of recognition accuracy. Statistical parametric representation method was utilized in this study and the Malay corpus database was constructed to be balanced with all the phonetic placed and manner of articulation sample appeared in Malay speech articulation therapy. The results were achieved by conducting few experiments by collecting sample from 80 patient speakers (child and adult) and contain for almost 30,720 sample training data
    corecore