Search CORE

16,532 research outputs found

Mismatched Training Data Enhancement for Automatic Recognition of Children’s Speech using DNN-HMM

Author: Dai Lirong
McLoughlin Ian
Qian Mengjie
Quo Wu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2016
Field of study

The increasing profusion of commercial automatic speech recognition technology applications has been driven by big-data techniques, using high quality labelled speech datasets. Children's speech has greater time and frequency domain variability than typical adult speech, lacks good large scale training data, and presents difficulties relating to capture quality. Each of these factors reduces the performance of systems that automatically recognise children's speech. In this paper, children's speech recognition is investigated using a hybrid acoustic modelling approach based on deep neural networks and Gaussian mixture models with hidden Markov model back ends. We explore the incorporation of mismatched training data to achieve a better acoustic model and improve performance in the face of limited training data, as well as training data augmentation using noise. We also explore two arrangements for vocal tract length normalisation and a gender-based data selection technique suitable for training a children's speech recogniser

Crossref

Kent Academic Repository

Proposing a hybrid approach for emotion classification using audio and video data

Author: Azimi Khojasteh Rezvan
Naji Alobaidi
Rafeh Reza
Publication venue: AIRCC Digital Library
Publication date: 30/11/2019
Field of study

Emotion recognition has been a research topic in the field of Human-Computer Interaction (HCI) during recent years. Computers have become an inseparable part of human life. Users need human-like interaction to better communicate with computers. Many researchers have become interested in emotion recognition and classification using different sources. A hybrid approach of audio and text has been recently introduced. All such approaches have been done to raise the accuracy and appropriateness of emotion classification. In this study, a hybrid approach of audio and video has been applied for emotion recognition. The innovation of this approach is selecting the characteristics of audio and video and their features as a unique specification for classification. In this research, the SVM method has been used for classifying the data in the SAVEE database. The experimental results show the maximum classification accuracy for audio data is 91.63% while by applying the hybrid approach the accuracy achieved is 99.26%

Crossref

Wintec Research Archive

Using multiple visual tandem streams in audio-visual speech recognition

Author: Erdogan Hakan
Erdoğan Hakan
Topkaya İbrahim Saygın
Topkaya Ibrahim Saygin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

The method which is called the "tandem approach" in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach of multi-stream hidden Markov models where visual tandem features from two different classifiers are considered as additional streams in the model. It is shown in our experiments that using multiple visual tandem features improve the recognition accuracy in various noise conditions. In addition, in order to handle asynchrony between audio and visual observations, we employ coupled hidden Markov models and obtain improved performance as compared to the synchronous model

CiteSeerX

Sabanci University Research Database

A statistical multiresolution approach for face recognition using structural hidden Markov models

Author: Amira A
Bouchaffra D
Nicholl P
Perrott R H
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2007
Field of study

This paper introduces a novel methodology that combines the multiresolution feature of the discrete wavelet transform (DWT) with the local interactions of the facial structures expressed through the structural hidden Markov model (SHMM). A range of wavelet filters such as Haar, biorthogonal 9/7, and Coiflet, as well as Gabor, have been implemented in order to search for the best performance. SHMMs perform a thorough probabilistic analysis of any sequential pattern by revealing both its inner and outer structures simultaneously. Unlike traditional HMMs, the SHMMs do not perform the state conditional independence of the visible observation sequence assumption. This is achieved via the concept of local structures introduced by the SHMMs. Therefore, the long-range dependency problem inherent to traditional HMMs has been drastically reduced. SHMMs have not previously been applied to the problem of face identification. The results reported in this application have shown that SHMM outperforms the traditional hidden Markov model with a 73% increase in accuracy

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Brunel University Research Archive

Hubungan gaya pembelajaran dengan pencapaian akademik pelajar aliran vokasional

Author: Anwar Nurul Farahanaa
Publication venue
Publication date: 01/01/2013
Field of study

Analisis keputusan Sijil Pelajaran Malaysia (SPM) 2011 menunjukkan penurunan pencapaian bagi Sekolah Menengah Vokasional. Oleh itu, kajian ini dilaksanakan bertujuan untuk mengkaji hubungan di antara gaya pembelajaran dengan pencapaian akademik pelajar. Kajian ini juga ingin mengenalpasti gaya pembelajaran paling dominan yang diamalkan oleh pelajar serta melihat perbezaan gaya pembelajaran dengan jantina pelajar. Seramai 131 orang Pelajar Tingkatan Empat Kursus Vokasional Di Sekolah Menengah Vokasional Segamat di Johor telah terlibat dalam kajian ini. Soal selidik Index of Learning Style (ILS) yang dibangunkan oleh Felder dan Silverman (1991) yang mengandungi 44 soalan telah digunakan untukh menjalankan kajian ini. Gaya pembelajaran pelajar dapat dilihat melalui empat dimensi gaya pembelajaran yang terdiri dari dua sub-skala yang bertentangan iaitu dimensi pelajar Aktif dan Reflektif, dimensi pelajar Konkrit dan Intuitif, dimensi pelajar Verbal dan Visual, serta dimensi pelajar Tersusun dan Global. Data yang diperolehi dianalisis dengan menggunakan perisian Statistical Package for Social Science for WINDOW release 20.0 (SPSS.20.0). Ujian Korelasi Pearson digunakan untuk menganalisis data dalam mengkaji hubungan gaya pembelajaran dengan pencapaian akademik pelajar. Nilai pekali p yang diperolehi di antara gaya pembelajaran dengan pencapaian pelajar adalah (p=0.1 hingga 0.4). Ini menunjukkan tidak terdapat hubungan yang signifikan di antara dua pembolehubah tersebut. Kajian ini juga mendapati bahawa gaya pembelajaran yang menjadi amalan pelajar ialah gaya pembelajaran Tersusun. Hasil kajian juga mendapati bahawa tidak terdapat perbezaan yang signifikan di antara gaya pembelajaran dengan jantina pelajar

UTHM Institutional Repository