15 research outputs found

    Deep maxout networks for low-resource speech recognition

    Full text link

    Cloud-based Automatic Speech Recognition Systems for Southeast Asian Languages

    Full text link
    This paper provides an overall introduction of our Automatic Speech Recognition (ASR) systems for Southeast Asian languages. As not much existing work has been carried out on such regional languages, a few difficulties should be addressed before building the systems: limitation on speech and text resources, lack of linguistic knowledge, etc. This work takes Bahasa Indonesia and Thai as examples to illustrate the strategies of collecting various resources required for building ASR systems.Comment: Published by the 2017 IEEE International Conference on Orange Technologies (ICOT 2017

    Neural networks for distant speech recognition

    Get PDF
    Distant conversational speech recognition is challenging ow-ing to the presence of multiple, overlapping talkers, additional non-speech acoustic sources, and the effects of reverberation. In this paper we review work on distant speech recognition, with an emphasis on approaches which combine multichan-nel signal processing with acoustic modelling, and investi-gate the use of hybrid neural network / hidden Markov model acoustic models for distant speech recognition of meetings recorded using microphone arrays. In particular we investi-gate the use of convolutional and fully-connected neural net-works with different activation functions (sigmoid, rectified linear, and maxout). We performed experiments on the AMI and ICSI meeting corpora, with results indicating that neu-ral network models are capable of significant improvements in accuracy compared with discriminatively trained Gaussian mixture models. Index Terms — convolutional neural networks, distant speech recognition, rectifier unit, maxout networks, beam-forming, meetings, AMI corpus, ICSI corpus 1

    Differentiable pooling for unsupervised speaker adaptation

    Get PDF
    This paper proposes a differentiable pooling mechanism to perform model-based neural network speaker adaptation. The proposed tech-nique learns a speaker-dependent combination of activations within pools of hidden units, was shown to work well unsupervised, and does not require speaker-adaptive training. We have conducted a set of experiments on the TED talks data, as used in the IWSLT evalu-ations. Our results indicate that the approach can reduce word error rates (WERs) on standard IWSLT test sets by about 5–11 % relative compared to speaker-independent systems and was found comple-mentary to the recently proposed learning hidden units contribution (LHUC) approach, reducing WER by 6–13 % relative. Both methods were also found to work well when adapting with small amounts of unsupervised data – 10 seconds is able to decrease the WER by 5% relative compared to the baseline speaker independent system

    Unsupervised Domain Discovery Using Latent Dirichlet Allocation for Acoustic Modelling in Speech Recognition

    Get PDF
    Speech recognition systems are often highly domain dependent, a fact widely reported in the literature. However the concept of domain is complex and not bound to clear criteria. Hence it is often not evident if data should be considered to be out-of-domain. While both acoustic and language models can be domain specific, work in this paper concentrates on acoustic modelling. We present a novel method to perform unsupervised discovery of domains using Latent Dirichlet Allocation (LDA) modelling. Here a set of hidden domains is assumed to exist in the data, whereby each audio segment can be considered to be a weighted mixture of domain properties. The classification of audio segments into domains allows the creation of domain specific acoustic models for automatic speech recognition. Experiments are conducted on a dataset of diverse speech data covering speech from radio and TV broadcasts, telephone conversations, meetings, lectures and read speech, with a joint training set of 60 hours and a test set of 6 hours. Maximum A Posteriori (MAP) adaptation to LDA based domains was shown to yield relative Word Error Rate (WER) improvements of up to 16% relative, compared to pooled training, and up to 10%, compared with models adapted with human-labelled prior domain knowledge

    Edge-Based Health Care Monitoring System: Ensemble of Classifier Based Model

    Get PDF
    Health Monitoring System (HMS) is an excellent tool that actually saves lives. It makes use of transmitters to gather information and transmits it wirelessly to a receiver. Essentially, it is much more practical than the large equipment that the majority of hospitals now employ and continuously checks a patient's health data 24/7. The primary goal of this research is to develop a three-layered Ensemble of Classifier model on Edge based Healthcare Monitoring System (ECEHMS) and Gauss Iterated Pelican Optimization Algorithm (GIPOA) including data collection layer, data analytics layer, and presentation layer. As per our ECEHMS-GIPOA, the healthcare dataset is collected from the UCI repository. The data analytics layer performs preprocessing, feature extraction, dimensionality reduction and classification. Data normalization will be done in preprocessing step. Statistical features (Min/Max, SD, Mean, Median), improved higher order statistical features (Skewness, Kurtosis, Entropy), and Technical indicator based features were extracted during Feature Extraction step. Improved Fuzzy C-means clustering (FCM) will be used for handling the Dimensionality reduction issue by clustering the appropriate feature set from the extracted features. Ensemble model is introduced to predict the disease stage that including the models like Deep Maxout Network (DMN), Improved Deep Belief Network (IDBN), and Recurrent Neural Network (RNN). Also, the enhancement in prediction/classification accuracy is assured via optimal training. For which, a GIPOA is introduced. Finally, ECEHMS-GIPOA performance is compared with other conventional approaches like ASO, BWO, SLO, SSO, FPA, and POA

    Differentiable Pooling for Unsupervised Acoustic Model Adaptation

    Get PDF
    We present a deep neural network (DNN) acoustic model that includes parametrised and differentiable pooling operators. Unsupervised acoustic model adaptation is cast as the problem of updating the decision boundaries implemented by each pooling operator. In particular, we experiment with two types of pooling parametrisations: learned LpL_p-norm pooling and weighted Gaussian pooling, in which the weights of both operators are treated as speaker-dependent. We perform investigations using three different large vocabulary speech recognition corpora: AMI meetings, TED talks and Switchboard conversational telephone speech. We demonstrate that differentiable pooling operators provide a robust and relatively low-dimensional way to adapt acoustic models, with relative word error rates reductions ranging from 5--20% with respect to unadapted systems, which themselves are better than the baseline fully-connected DNN-based acoustic models. We also investigate how the proposed techniques work under various adaptation conditions including the quality of adaptation data and complementarity to other feature- and model-space adaptation methods, as well as providing an analysis of the characteristics of each of the proposed approaches.Comment: 11 pages, 7 Tables, 7 Figures in IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, num. 11, 201
    corecore