Search CORE

12,467 research outputs found

An autopoietic approach to the development of speech recognition (pendekatan autopoietic dalam pembangunan pengecaman suara)

Author: Ahmad Abd. Manan
Publication venue: Fakulti Sains Komputer dan Sistem Maklumat
Publication date: 03/12/2006
Field of study

The focus of research here is on the implementation of speech recognition through an autopoietic approach. The work done here has culminated in the introduction of a neural network architecture named Homunculus Network. This network was used in the development of a speech recognition system for Bahasa Melayu. The speech recognition system is an isolated-word, phoneme-level speech recognizer that is speaker independent and has a vocabulary of 15 words. The research done has identified some issues worth further work later. These issues are also the basis for the design and the development of the new autopoietic speech recognition system

Universiti Teknologi Malaysia Institutional Repository

Automatic voice recognition using traditional and artificial neural network approaches

Author: Botros Nazeih M.
Publication venue
Publication date
Field of study

The main objective of this research is to develop an algorithm for isolated-word recognition. This research is focused on digital signal analysis rather than linguistic analysis of speech. Features extraction is carried out by applying a Linear Predictive Coding (LPC) algorithm with order of 10. Continuous-word and speaker independent recognition will be considered in future study after accomplishing this isolated word research. To examine the similarity between the reference and the training sets, two approaches are explored. The first is implementing traditional pattern recognition techniques where a dynamic time warping algorithm is applied to align the two sets and calculate the probability of matching by measuring the Euclidean distance between the two sets. The second is implementing a backpropagation artificial neural net model with three layers as the pattern classifier. The adaptation rule implemented in this network is the generalized least mean square (LMS) rule. The first approach has been accomplished. A vocabulary of 50 words was selected and tested. The accuracy of the algorithm was found to be around 85 percent. The second approach is in progress at the present time

NASA Technical Reports Server

Speaker Independent Speech Recognition Using Neural Network

Author: Tan Chin Luh
Publication venue
Publication date: 01/12/2004
Field of study

In spite of the advances accomplished throughout the last few decades, automatic speech recognition (ASR) is still a challenging and difficult task when the systems are applied in the real world. Different requirements for various applications drive the researchers to explore for more effective ways in the particular application. Attempts to apply artificial neural networks (ANN) as a classification tool are proposed to increase the reliability of the system. This project studies the approach of using neural network for speaker independent isolated word recognition on small vocabularies and proposes a method to have a simple MLP as speech recognizer. Our approach is able to overcome the current limitations of MLP in the selection of input buffers’ size by proposing a method on frames selection. Linear predictive coding (LPC) has been applied to represent speech signal in frames in early stage. Features from the selected frames are used to train the multilayer perceptrons (MLP) feedforward back-propagation (FFBP) neural network during the training stage. Same routine has been applied to the speech signal during the recognition stage and the unknown test pattern will be classified to one of the nearest pattern. In short, the selected frames represent the local features of the speech signal and all of them contribute to the global similarity for the whole speech signal. The analysis, design and the PC based voice dialling system is developed using MATLAB®

Universiti Putra Malaysia Institutional Repository

The Optimal Performance of Multi-Layer Neural Network for Speaker-Independent Isolated Spoken Malay Parliamentary speech

Author: Abdullah Nur Atiqah Sia
Abu Bakar Nordin
Abu Bakar Zainab
Mohamed Haslizatul Fairuz
Prasanna Ramakrisnan
Seman Noraini
Syed Ahmad Sharifah Mumtazah
Publication venue: Faculty of Computer and Mathematical Sciences
Publication date: 01/01/2010
Field of study

This paper describes speech recognizer modeling techniques which are suited to high performance and robust isolated word recognition in speaker-independent manner. In this study, a speech recognition system is presented, specifically for an isolated spoken Malay word recognizer which uses spontaneous and formal speeches collected from Parliament of Malaysia. Currently the vocabulary is limited to ten words that can be pronounced exactly as it written and control the distribution of the vocalic segments. The speech segmentation task is achieved by adopted energy based parameter and zero crossing rate measure with modification to better locates the beginning and ending points of speech from the spoken words. The training and recognition processes are realized by using Multi-layer Perceptron (MLP) Neural Networks with two-layer feedforward network configurations that are trained with stochastic error back-propagation to adjust its weights and biases after presentation of every training data. The Mel-frequency Cepstral Coefficients (MFCCs) has been chosen as speech extraction approach from each segmented utterance as characteristic features for the word recognizer. The MLP performance to determine the optimal cepstral orders and hidden neurons numbers are analyzed. Recognition results showed that the performance of the two-layer network increased as the numbers of hidden neurons increased. Experimental result also showed that the cepstral orders of 12 to 14 were appropriate for the speech feature extraction for the data in this study

Universiti Teknologi MARA Institutional Repository

Articulatory and bottleneck features for speaker-independent ASR of dysarthric speech

Author: Franco Horacio
Mitra Vikramjit
Sivaraman Ganesh
Yılmaz Emre
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

The rapid population aging has stimulated the development of assistive devices that provide personalized medical support to the needies suffering from various etiologies. One prominent clinical application is a computer-assisted speech training system which enables personalized speech therapy to patients impaired by communicative disorders in the patient's home environment. Such a system relies on the robust automatic speech recognition (ASR) technology to be able to provide accurate articulation feedback. With the long-term aim of developing off-the-shelf ASR systems that can be incorporated in clinical context without prior speaker information, we compare the ASR performance of speaker-independent bottleneck and articulatory features on dysarthric speech used in conjunction with dedicated neural network-based acoustic models that have been shown to be robust against spectrotemporal deviations. We report ASR performance of these systems on two dysarthric speech datasets of different characteristics to quantify the achieved performance gains. Despite the remaining performance gap between the dysarthric and normal speech, significant improvements have been reported on both datasets using speaker-independent ASR architectures.Comment: to appear in Computer Speech & Language - https://doi.org/10.1016/j.csl.2019.05.002 - arXiv admin note: substantial text overlap with arXiv:1807.1094

arXiv.org e-Print Archive

Radboud Repository

ScholarBank@NUS

Towards deep learning on speech recognition for Khmer language

Author: Lim Chanmann
Publication venue: University of Missouri--Columbia
Publication date
Field of study

In order to perform speech recognition well, a huge amount of transcribed speech and textual data in the target language must be available for system training. The high demand for language resources constrains the development of speech recognition systems for new languages. In this thesis the development of a low-resourced isolated-word recognition system for "Khmer" language is investigated. Speech data, collected via mobile phone, containing 194 vocabulary words is used in our experiments. Data pre-processing based on Voice Activity Detection (VAD) is discussed. As by-products of this work, phoneme based pronunciation lexicon and state tying questions set for Khmer speech recognizer are built from scratch. In addition to the conventional statistical acoustic modeling using Gaussian Mixture Model and hidden Markov Model (GMMHMM), a hybrid acoustic model based on Deep Neural Network (DNN-HMM) trained to predict contextdependent triphone states is evaluated. Dropout is used to improve the robustness of the DNN, and crosslingual transfer learning that makes use of auxiliary training data in English is also investigated. As the first effort in using DNN-HMM for low-resourced isolated-word recognition for Khmer language, the system currently performs at 93.31% word accuracy in speaker-independent mode on our test set

University of Missouri: MOspace