40 research outputs found

    Statistical models for HMM/ANN hybrids

    Get PDF
    We present a theoretical investigation into the use of normalised artificial neural network (ANN) outputs in the context of hidden Markov models (HMMs). The work is motivated by the pursuit of a more theoretically rigorous understanding of the Kullback-Liebler (KL)-HMM. Two possible models are considered based respectively on the HMM states storing categorical distributions and Dirichlet distributions. Training and recognition algorithms are derived, and possible relationships with KL-HMM are briefly discussed

    Using KL-divergence and multilingual information to improve ASR for under-resourced languages

    Get PDF
    Setting out from the point of view that automatic speech recognition (ASR) ought to benefit from data in languages other than the target language, we propose a novel Kullback-Leibler (KL) divergence based method that is able to exploit multilingual information in the form of universal phoneme posterior probabilities conditioned on the acoustics. We formulate a means to train a recognizer on several different languages, and subsequently recognize speech in a target language for which only a small amount of data is available. Taking the Greek SpeechDat(II) data as an example, we show that the proposed formulation is sound, and show that it is able to outperform a current state-of-the-art HMM/GMM system. We also use a hybrid Tandem-like system to further understand the source of the benefit

    Crosslingual Tandem-SGMM: Exploiting Out-Of-Language Data for Acoustic Model and Feature Level Adaptation

    Get PDF
    Recent studies have shown that speech recognizers may benefit from data in languages other than the target language through efficient acoustic model- or feature-level adaptation. Crosslingual Tandem-Subspace Gaussian Mixture Models (SGMM) are successfully able to combine acoustic model- and feature-level adaptation techniques. More specifically, we focus on under-resourced languages (Afrikaans in our case) and perform feature-level adaptation through the estimation of phone class posterior features with a Multilayer Perceptron that was trained on data from a similar language with large amounts of available speech data (Dutch in our case). The same Dutch data can also be exploited on an acoustic model-level by training globally-shared SGMM parameters in a crosslingual way. The two adaptation techniques are indeed complementary and result in a crosslingual Tandem-SGMM system that yields relative improvement of about 22% compared to a standard speech recognizer on an Afrikaans phoneme recognition task. Interestingly, eventual score-level combination of the individual SGMM systems yields additional 3% relative improvement

    Automatic Speech Recognition and Translation of a Swiss German Dialect: Walliserdeutsch

    Get PDF
    Walliserdeutsch is a Swiss German dialect spoken in the south west of Switzerland. To investigate the potential of automatic speech processing of Walliserdeutsch, a small database was collected based mainly on broadcast news from a local radio station. Experiments suggest that automatic speech recognition is feasible: use of another (Swiss German) database shows that the small data size lends itself to bootstrapping from other data; use of Kullback-Leibler HMM suggests that phoneme mapping techniques can compensate for a grapheme-based dictionary. Experiments also indicate that statistical machine translation is feasible; the difficulty of small data size is offset by the close proximity to (high) German

    Boosting under-resourced speech recognizers by exploiting out of language data - Case study on Afrikaans

    Get PDF
    Under-resourced speech recognizers may benefit from data in languages other than the target language. In this paper, we boost the performance of an Afrikaans speech recognizer by using already available data from other languages. To successfully exploit available multilingual resources, we use posterior features, estimated by multilayer perceptrons that are trained on similar languages. For two different acoustic modeling techniques, Tandem and Kullback-Leibler divergence based HMMs, the proposed multilingual system yields more than 10% relative improvement compared to the corresponding monolingual systems only trained on Afrikaans

    Using out-of-language data to improve an under-resourced speech recognizer

    Get PDF
    Under-resourced speech recognizers may benefit from data in languages other than the target language. In this paper, we report how to boost the performance of an Afrikaans automatic speech recognition system by using already available Dutch data. We successfully exploit available multilingual resources through 1) posterior features, estimated by multilayer perceptrons (MLP) and 2) subspace Gaussian mixture models (SGMMs). Both the MLPs and the SGMMs can be trained on out-of-language data. We use three different acoustic modeling techniques, namely Tandem, Kullback-Leibler divergence based HMMs (KL-HMM) as well as SGMMs and show that the proposed multilingual systems yield 12% relative improvement compared to a conventional monolingual HMM/GMM system only trained on Afrikaans. We also show that KL-HMMs are extremely powerful for under-resourced languages: using only six minutes of Afrikaans data (in combination with out-of-language data), KL-HMM yields about 30% relative improvement compared to conventional maximum likelihood linear regression and maximum a posteriori based acoustic model adaptation

    Application of Subspace Gaussian Mixture Models in Contrastive Acoustic Scenarios

    Get PDF
    This paper describes experimental results of applying Subspace Gaussian Mixture Models (SGMMs) in two completely diverse acoustic scenarios: (a) for Large Vocabulary Continuous Speech Recognition (LVCSR) task over (well-resourced) English meeting data and, (b) for acoustic modeling of underresourced Afrikaans telephone data. In both cases, the performance of SGMM models is compared with a conventional context-dependent HMM/GMM approach exploiting the same kind of information available during the training. LVCSR systems are evaluated on standard NIST Rich Transcription dataset. For under-resourced Afrikaans, SGMM and HMM/GMM acoustic systems are additionally compared to KL-HMM and multilingual Tandem techniques boosted using supplemental out-of-domain data. Experimental results clearly show that the SGMMapproach (having considerably less model parameters) outperforms conventional HMM/GMM system in both scenarios and for all examined training conditions. In case of under-resourced scenario, the SGMM trained only using indomain data is superior to other tested approaches boosted by data from other domain

    Impact of deep MLP architecture on different acoustic modeling techniques for under-resourced speech recognition

    Get PDF
    Posterior based acoustic modeling techniques such as Kullback– Leibler divergence based HMM (KL-HMM) and Tandem are able to exploit out-of-language data through posterior fea-tures, estimated by a Multi-Layer Perceptron (MLP). In this paper, we investigate the performance of posterior based ap-proaches in the context of under-resourced speech recognition when a standard three-layer MLP is replaced by a deeper five-layer MLP. The deeper MLP architecture yields similar gains of about 15 % (relative) for Tandem, KL-HMM as well as for a hybrid HMM/MLP system that directly uses the poste-rior estimates as emission probabilities. The best performing system, a bilingual KL-HMM based on a deep MLP, jointly trained on Afrikaans and Dutch data, performs 13 % better than a hybrid system using the same bilingual MLP and 26% better than a subspace Gaussian mixture system only trained on Afrikaans data. Index Terms — KL-HMM, Tandem, hybrid system, deep MLPs, under-resourced speech recognitio

    Regulation of human mTOR complexes by DEPTOR

    Get PDF
    The vertebrate-specific DEP domain-containing mTOR interacting protein (DEPTOR), an oncoprotein or tumor suppressor, has important roles in metabolism, immunity, and cancer. It is the only protein that binds and regulates both complexes of mammalian target of rapamycin (mTOR), a central regulator of cell growth. Biochemical analysis and cryo-EM reconstructions of DEPTOR bound to human mTOR complex 1 (mTORC1) and mTORC2 reveal that both structured regions of DEPTOR, the PDZ domain and the DEP domain tandem (DEPt), are involved in mTOR interaction. The PDZ domain binds tightly with mildly activating effect, but then acts as an anchor for DEPt association that allosterically suppresses mTOR activation. The binding interfaces of the PDZ domain and DEPt also support further regulation by other signaling pathways. A separate, substrate-like mode of interaction for DEPTOR phosphorylation by mTOR complexes rationalizes inhibition of non-stimulated mTOR activity at higher DEPTOR concentrations. The multifaceted interplay between DEPTOR and mTOR provides a basis for understanding the divergent roles of DEPTOR in physiology and opens new routes for targeting the mTOR-DEPTOR interaction in disease

    Comparative Study on Sentence Boundary Prediction for German and English Broadcast News

    Get PDF
    We present a comparative study on sentence boundary prediction for German and English broadcast news that explores generalization across different languages. In the feature extraction stage, word pause duration is firstly extracted from word aligned speech, and forward and backward language models are utilized to extract textual features. Then a gradient boosted machine is optimized by grid search to map these features to punctuation marks. Experimental results confirm that word pause duration is a simple yet effective feature to predict whether there is a sentence boundary after that word. We found that Bayes risk derived from pause duration distributions of sentence boundary words and non-boundary words is an effective measure to assess the inherent difficulty of sentence boundary prediction. The proposed method achieved F-measures of over 90% on reference text and around 90% on ASR transcript for both German broadcast news corpus and English multi-genre broadcast news corpus. This demonstrates the state of the art performance of the proposed method
    corecore