15 research outputs found
Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages
This paper provides an overall introduction of our Automatic Speech
Recognition (ASR) systems for Southeast Asian languages. As not much existing
work has been carried out on such regional languages, a few difficulties should
be addressed before building the systems: limitation on speech and text
resources, lack of linguistic knowledge, etc. This work takes Bahasa Indonesia
and Thai as examples to illustrate the strategies of collecting various
resources required for building ASR systems.Comment: Published by the 2017 IEEE International Conference on Orange
Technologies (ICOT 2017
Neural networks for distant speech recognition
Distant conversational speech recognition is challenging ow-ing to the presence of multiple, overlapping talkers, additional non-speech acoustic sources, and the effects of reverberation. In this paper we review work on distant speech recognition, with an emphasis on approaches which combine multichan-nel signal processing with acoustic modelling, and investi-gate the use of hybrid neural network / hidden Markov model acoustic models for distant speech recognition of meetings recorded using microphone arrays. In particular we investi-gate the use of convolutional and fully-connected neural net-works with different activation functions (sigmoid, rectified linear, and maxout). We performed experiments on the AMI and ICSI meeting corpora, with results indicating that neu-ral network models are capable of significant improvements in accuracy compared with discriminatively trained Gaussian mixture models. Index Terms — convolutional neural networks, distant speech recognition, rectifier unit, maxout networks, beam-forming, meetings, AMI corpus, ICSI corpus 1
Differentiable pooling for unsupervised speaker adaptation
This paper proposes a differentiable pooling mechanism to perform model-based neural network speaker adaptation. The proposed tech-nique learns a speaker-dependent combination of activations within pools of hidden units, was shown to work well unsupervised, and does not require speaker-adaptive training. We have conducted a set of experiments on the TED talks data, as used in the IWSLT evalu-ations. Our results indicate that the approach can reduce word error rates (WERs) on standard IWSLT test sets by about 5–11 % relative compared to speaker-independent systems and was found comple-mentary to the recently proposed learning hidden units contribution (LHUC) approach, reducing WER by 6–13 % relative. Both methods were also found to work well when adapting with small amounts of unsupervised data – 10 seconds is able to decrease the WER by 5% relative compared to the baseline speaker independent system
Unsupervised Domain Discovery Using Latent Dirichlet Allocation for Acoustic Modelling in Speech Recognition
Speech recognition systems are often highly domain dependent, a fact widely reported in the literature. However the concept of domain is complex and not bound to clear criteria. Hence it is often not evident if data should be considered to be out-of-domain. While both acoustic and language models can be domain specific, work in this paper concentrates on acoustic modelling. We present a novel method to perform unsupervised discovery of domains using Latent Dirichlet Allocation (LDA) modelling. Here a set of hidden domains is assumed to exist in the data, whereby each audio segment can be considered to be a weighted mixture of domain properties. The classification of audio segments into domains allows the creation of domain specific acoustic models for automatic speech recognition. Experiments are conducted on a dataset of diverse speech data covering speech from radio and TV broadcasts, telephone conversations, meetings, lectures and read speech, with a joint training set of 60 hours and a test set of 6 hours. Maximum A Posteriori (MAP) adaptation to LDA based domains was shown to yield relative Word Error Rate (WER) improvements of up to 16% relative, compared to pooled training, and up to 10%, compared with models adapted with human-labelled prior domain knowledge
Edge-Based Health Care Monitoring System: Ensemble of Classifier Based Model
Health Monitoring System (HMS) is an excellent tool that actually saves lives. It makes use of transmitters to gather information and transmits it wirelessly to a receiver. Essentially, it is much more practical than the large equipment that the majority of hospitals now employ and continuously checks a patient's health data 24/7. The primary goal of this research is to develop a three-layered Ensemble of Classifier model on Edge based Healthcare Monitoring System (ECEHMS) and Gauss Iterated Pelican Optimization Algorithm (GIPOA) including data collection layer, data analytics layer, and presentation layer. As per our ECEHMS-GIPOA, the healthcare dataset is collected from the UCI repository. The data analytics layer performs preprocessing, feature extraction, dimensionality reduction and classification. Data normalization will be done in preprocessing step. Statistical features (Min/Max, SD, Mean, Median), improved higher order statistical features (Skewness, Kurtosis, Entropy), and Technical indicator based features were extracted during Feature Extraction step. Improved Fuzzy C-means clustering (FCM) will be used for handling the Dimensionality reduction issue by clustering the appropriate feature set from the extracted features. Ensemble model is introduced to predict the disease stage that including the models like Deep Maxout Network (DMN), Improved Deep Belief Network (IDBN), and Recurrent Neural Network (RNN). Also, the enhancement in prediction/classification accuracy is assured via optimal training. For which, a GIPOA is introduced. Finally, ECEHMS-GIPOA performance is compared with other conventional approaches like ASO, BWO, SLO, SSO, FPA, and POA
Differentiable Pooling for Unsupervised Acoustic Model Adaptation
We present a deep neural network (DNN) acoustic model that includes
parametrised and differentiable pooling operators. Unsupervised acoustic model
adaptation is cast as the problem of updating the decision boundaries
implemented by each pooling operator. In particular, we experiment with two
types of pooling parametrisations: learned -norm pooling and weighted
Gaussian pooling, in which the weights of both operators are treated as
speaker-dependent. We perform investigations using three different large
vocabulary speech recognition corpora: AMI meetings, TED talks and Switchboard
conversational telephone speech. We demonstrate that differentiable pooling
operators provide a robust and relatively low-dimensional way to adapt acoustic
models, with relative word error rates reductions ranging from 5--20% with
respect to unadapted systems, which themselves are better than the baseline
fully-connected DNN-based acoustic models. We also investigate how the proposed
techniques work under various adaptation conditions including the quality of
adaptation data and complementarity to other feature- and model-space
adaptation methods, as well as providing an analysis of the characteristics of
each of the proposed approaches.Comment: 11 pages, 7 Tables, 7 Figures in IEEE/ACM Transactions on Audio,
Speech, and Language Processing, vol. 24, num. 11, 201