38 research outputs found
Π Π°Π·Π½ΠΎΠ²ΠΈΠ΄Π½ΠΎΡΡΠΈ Π³Π»ΡΠ±ΠΎΠΊΠΈΡ ΠΈΡΠΊΡΡΡΡΠ²Π΅Π½Π½ΡΡ Π½Π΅ΠΉΡΠΎΠ½Π½ΡΡ ΡΠ΅ΡΠ΅ΠΉ Π΄Π»Ρ ΡΠΈΡΡΠ΅ΠΌ ΡΠ°ΡΠΏΠΎΠ·Π½Π°Π²Π°Π½ΠΈΡ ΡΠ΅ΡΠΈ
This paper presents a survey of basic methods for acoustic and language model development based on artificial neural networks for automatic speech recognition systems. The hybrid and tandem approaches for combination of Hidden Markov Models and artificial neural networks for acoustic modelling are given. The creation of language models using feedforward and recurrent neural networks is described. The survey of researches, conducted in this field, shows that application of artificial neural networks at the stages of both acoustic and language modeling allows decreasing word error rate.Π ΡΡΠ°ΡΡΠ΅ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ Π°Π½Π°Π»ΠΈΡΠΈΡΠ΅ΡΠΊΠΈΠΉ ΠΎΠ±Π·ΠΎΡ ΠΎΡΠ½ΠΎΠ²Π½ΡΡ
ΡΠ°Π·Π½ΠΎΠ²ΠΈΠ΄Π½ΠΎΡΡΠ΅ΠΉ Π°ΠΊΡΡΡΠΈΡΠ΅ΡΠΊΠΈΡ
ΠΈ ΡΠ·ΡΠΊΠΎΠ²ΡΡ
ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΠΈΡΠΊΡΡΡΡΠ²Π΅Π½Π½ΡΡ
Π½Π΅ΠΉΡΠΎΠ½Π½ΡΡ
ΡΠ΅ΡΠ΅ΠΉ Π΄Π»Ρ ΡΠΈΡΡΠ΅ΠΌ Π°Π²ΡΠΎΠΌΠ°ΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΡΠ°ΡΠΏΠΎΠ·Π½Π°Π²Π°Π½ΠΈΡ ΡΠ΅ΡΠΈ. Π Π°ΡΡΠΌΠΎΡΡΠ΅Π½Ρ Π³ΠΈΠ±ΡΠΈΠ΄Π½ΡΠΉ ΠΈ ΡΠ°Π½Π΄Π΅ΠΌΠ½ΡΠΉ ΠΏΠΎΠ΄-Ρ
ΠΎΠ΄Ρ ΠΎΠ±ΡΠ΅Π΄ΠΈΠ½Π΅Π½ΠΈΡ ΡΠΊΡΡΡΡΡ
ΠΌΠ°ΡΠΊΠΎΠ²ΡΠΊΠΈΡ
ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ ΠΈ ΠΈΡΠΊΡΡΡΡΠ²Π΅Π½Π½ΡΡ
Π½Π΅ΠΉΡΠΎΠ½Π½ΡΡ
ΡΠ΅ΡΠ΅ΠΉ Π΄Π»Ρ Π°ΠΊΡΡΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ ΠΌΠΎΠ΄Π΅Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΡ, ΠΎΠΏΠΈΡΠ°Π½ΠΎ ΠΏΠΎΡΡΡΠΎΠ΅Π½ΠΈΠ΅ ΡΠ·ΡΠΊΠΎΠ²ΡΡ
ΠΌΠΎΠ΄Π΅Π»Π΅ΠΉ Ρ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ΠΌ ΡΠ΅ΡΠ΅ΠΉ ΠΏΡΡΠΌΠΎΠ³ΠΎ ΡΠ°ΡΠΏΡΠΎΡΡΡΠ°Π½Π΅Π½ΠΈΡ ΠΈ ΡΠ΅ΠΊΡΡΡΠ΅Π½ΡΠ½ΡΡ
Π½Π΅ΠΉΡΠΎΡΠ΅ΡΠ΅ΠΉ. ΠΠ±Π·ΠΎΡ ΠΈΡΡΠ»Π΅Π΄ΠΎΠ²Π°Π½ΠΈΠΉ Π² Π΄Π°Π½Π½ΠΎΠΉ ΠΎΠ±Π»Π°ΡΡΠΈ ΠΏΠΎΠΊΠ°Π·ΡΠ²Π°Π΅Ρ, ΡΡΠΎ ΠΏΡΠΈΠΌΠ΅Π½Π΅Π½ΠΈΠ΅ ΠΈΡΠΊΡΡΡΡΠ²Π΅Π½Π½ΡΡ
Π½Π΅ΠΉΡΠΎΠ½Π½ΡΡ
ΡΠ΅ΡΠ΅ΠΉ ΠΊΠ°ΠΊ Π½Π° ΡΡΠ°ΠΏΠ΅ Π°ΠΊΡΡΡΠΈΡΠ΅ΡΠΊΠΎΠ³ΠΎ, ΡΠ°ΠΊ ΠΈ Π½Π° ΡΡΠ°ΠΏΠ΅ ΡΠ·ΡΠΊΠΎΠ²ΠΎΠ³ΠΎ ΠΌΠΎΠ΄Π΅Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΡ ΠΏΠΎΠ·Π²ΠΎΠ»ΡΠ΅Ρ ΡΠ½ΠΈΠ·ΠΈΡΡ ΠΎΡΠΈΠ±ΠΊΡ ΡΠ°ΡΠΏΠΎΠ·Π½Π°Π²Π°Π½ΠΈΡ ΡΠ»ΠΎΠ²
Improving bottleneck features for Vietnamese large vocabulary continuous speech recognition system using deep neural networks
In this paper, the pre-training method based on denoising auto-encoder is investigated and proved to be good models for initializing bottleneck networks of Vietnamese speech recognition system that result in better recognition performance compared to base bottleneck features reported previously. The experiments are carried out on the dataset containing speeches on Voice of Vietnam channel (VOV). The results show that the DBNF extraction for Vietnamese recognition decreases relative word error rate by 14 % and 39 % compared to the base bottleneck features and MFCC baseline, respectively
Real-Time ASR from Meetings
The AMI(DA) system is a meeting room speech recognition system that has been developed and evaluated in the context of the NIST Rich Text (RT) evaluations. Recently, the ``Distant Access'' requirements of the AMIDA project have necessitated that the system operate in real-time. Another more difficult requirement is that the system fit into a live meeting transcription scenario. We describe an infrastructure that has allowed the AMI(DA) system to evolve into one that fulfils these extra requirements. We emphasise the components that address the live and real-time aspects
FEATURE AND SCORE LEVEL COMBINATION OF SUBSPACE GAUSSIANS IN LVCSR TASK
In this paper, we investigate employment of discriminatively trained acoustic features modeled by Subspace Gaussian Mixture Models (SGMMs) for Rich Transcription meeting recognition. More specifically, first, we focus on exploiting various types of complex features estimated using neural network combined with conventional cepstral features and modeled by standard HMM/GMMs and SGMMs. Then, outputs (word sequences) from individual recognizers trained using different features are also combined on a score-level using ROVER for the both acoustic modeling techniques. Experimental results indicate three important findings: (1) SGMMs consistently outperform HMM/GMMs (relative improvement on average by about 6% in terms of WER) when both techniques are exploited on single features; (2) SGMMs benefit much less from feature-level combination (1% relative improvement) as opposed to HMM/GMMs (4% relative improvement) which can eventually match the performance of SGMMs; (3) SGMMs can be significantly improved when individual systems are combined on a score-level. This suggests that the SGMM systems provide complementary recognition outputs. Overall relative improvements of the combined SGMM and HMM/GMM systems are 21% and 17% respectively compared to a standard ASR baseline