Search CORE

15 research outputs found

Deep maxout networks for low-resource speech recognition

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages

Author: Leung Cheung Chi
Ma Bin
Ni Chongjia
Sivadas Sunil
Tong Rong
Wang Lei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/10/2022
Field of study

This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages. As not much existing work has been carried out on such regional languages, a few difficulties should be addressed before building the systems: limitation on speech and text resources, lack of linguistic knowledge, etc. This work takes Bahasa Indonesia and Thai as examples to illustrate the strategies of collecting various resources required for building ASR systems.Comment: Published by the 2017 IEEE International Conference on Orange Technologies (ICOT 2017

arXiv.org e-Print Archive

Neural networks for distant speech recognition

Author: Renals Steve
Swietojanski Pawel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Distant conversational speech recognition is challenging ow-ing to the presence of multiple, overlapping talkers, additional non-speech acoustic sources, and the effects of reverberation. In this paper we review work on distant speech recognition, with an emphasis on approaches which combine multichan-nel signal processing with acoustic modelling, and investi-gate the use of hybrid neural network / hidden Markov model acoustic models for distant speech recognition of meetings recorded using microphone arrays. In particular we investi-gate the use of convolutional and fully-connected neural net-works with different activation functions (sigmoid, rectified linear, and maxout). We performed experiments on the AMI and ICSI meeting corpora, with results indicating that neu-ral network models are capable of significant improvements in accuracy compared with discriminatively trained Gaussian mixture models. Index Terms — convolutional neural networks, distant speech recognition, rectifier unit, maxout networks, beam-forming, meetings, AMI corpus, ICSI corpus 1

CiteSeerX

Crossref

Edinburgh Research Explorer

Differentiable pooling for unsupervised speaker adaptation

Author: Renals Steve
Swietojanski Pawel
Publication venue
Publication date: 01/01/2015
Field of study

This paper proposes a differentiable pooling mechanism to perform model-based neural network speaker adaptation. The proposed tech-nique learns a speaker-dependent combination of activations within pools of hidden units, was shown to work well unsupervised, and does not require speaker-adaptive training. We have conducted a set of experiments on the TED talks data, as used in the IWSLT evalu-ations. Our results indicate that the approach can reduce word error rates (WERs) on standard IWSLT test sets by about 5–11 % relative compared to speaker-independent systems and was found comple-mentary to the recently proposed learning hidden units contribution (LHUC) approach, reducing WER by 6–13 % relative. Both methods were also found to work well when adapting with small amounts of unsupervised data – 10 seconds is able to decrease the WER by 5% relative compared to the baseline speaker independent system

CiteSeerX

Edinburgh Research Explorer

Unsupervised Domain Discovery Using Latent Dirichlet Allocation for Acoustic Modelling in Speech Recognition

Author: Doulaty Bashkand M.
Hain T.
Saz O.
Publication venue: ISCA (International Speech Communication Association)
Publication date: 06/09/2015
Field of study

Speech recognition systems are often highly domain dependent, a fact widely reported in the literature. However the concept of domain is complex and not bound to clear criteria. Hence it is often not evident if data should be considered to be out-of-domain. While both acoustic and language models can be domain specific, work in this paper concentrates on acoustic modelling. We present a novel method to perform unsupervised discovery of domains using Latent Dirichlet Allocation (LDA) modelling. Here a set of hidden domains is assumed to exist in the data, whereby each audio segment can be considered to be a weighted mixture of domain properties. The classification of audio segments into domains allows the creation of domain specific acoustic models for automatic speech recognition. Experiments are conducted on a dataset of diverse speech data covering speech from radio and TV broadcasts, telephone conversations, meetings, lectures and read speech, with a joint training set of 60 hours and a test set of 6 hours. Maximum A Posteriori (MAP) adaptation to LDA based domains was shown to yield relative Word Error Rate (WER) improvements of up to 16% relative, compared to pooled training, and up to 10%, compared with models adapted with human-labelled prior domain knowledge

arXiv.org e-Print Archive

White Rose Research Online

Edge-Based Health Care Monitoring System: Ensemble of Classifier Based Model

Author: A Raja
K R Venugopal
P M Prathibhavani
Publication venue: Auricle Global Society of Education and Research
Publication date: 18/08/2023
Field of study

Health Monitoring System (HMS) is an excellent tool that actually saves lives. It makes use of transmitters to gather information and transmits it wirelessly to a receiver. Essentially, it is much more practical than the large equipment that the majority of hospitals now employ and continuously checks a patient's health data 24/7. The primary goal of this research is to develop a three-layered Ensemble of Classifier model on Edge based Healthcare Monitoring System (ECEHMS) and Gauss Iterated Pelican Optimization Algorithm (GIPOA) including data collection layer, data analytics layer, and presentation layer. As per our ECEHMS-GIPOA, the healthcare dataset is collected from the UCI repository. The data analytics layer performs preprocessing, feature extraction, dimensionality reduction and classification. Data normalization will be done in preprocessing step. Statistical features (Min/Max, SD, Mean, Median), improved higher order statistical features (Skewness, Kurtosis, Entropy), and Technical indicator based features were extracted during Feature Extraction step. Improved Fuzzy C-means clustering (FCM) will be used for handling the Dimensionality reduction issue by clustering the appropriate feature set from the extracted features. Ensemble model is introduced to predict the disease stage that including the models like Deep Maxout Network (DMN), Improved Deep Belief Network (IDBN), and Recurrent Neural Network (RNN). Also, the enhancement in prediction/classification accuracy is assured via optimal training. For which, a GIPOA is introduced. Finally, ECEHMS-GIPOA performance is compared with other conventional approaches like ASO, BWO, SLO, SSO, FPA, and POA

International Journal on Recent and Innovation Trends in Computing and Communication

Differentiable Pooling for Unsupervised Acoustic Model Adaptation

Author: Renals Steve
Swietojanski Pawel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 13/07/2016
Field of study

We present a deep neural network (DNN) acoustic model that includes parametrised and differentiable pooling operators. Unsupervised acoustic model adaptation is cast as the problem of updating the decision boundaries implemented by each pooling operator. In particular, we experiment with two types of pooling parametrisations: learned

L_p

-norm pooling and weighted Gaussian pooling, in which the weights of both operators are treated as speaker-dependent. We perform investigations using three different large vocabulary speech recognition corpora: AMI meetings, TED talks and Switchboard conversational telephone speech. We demonstrate that differentiable pooling operators provide a robust and relatively low-dimensional way to adapt acoustic models, with relative word error rates reductions ranging from 5--20% with respect to unadapted systems, which themselves are better than the baseline fully-connected DNN-based acoustic models. We also investigate how the proposed techniques work under various adaptation conditions including the quality of adaptation data and complementarity to other feature- and model-space adaptation methods, as well as providing an analysis of the characteristics of each of the proposed approaches.Comment: 11 pages, 7 Tables, 7 Figures in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, num. 11, 201

arXiv.org e-Print Archive

Edinburgh Research Explorer