489 research outputs found
Exploration of End-to-End ASR for OpenSTT -- Russian Open Speech-to-Text Dataset
This paper presents an exploration of end-to-end automatic speech recognition
systems (ASR) for the largest open-source Russian language data set -- OpenSTT.
We evaluate different existing end-to-end approaches such as joint
CTC/Attention, RNN-Transducer, and Transformer. All of them are compared with
the strong hybrid ASR system based on LF-MMI TDNN-F acoustic model. For the
three available validation sets (phone calls, YouTube, and books), our best
end-to-end model achieves word error rate (WER) of 34.8%, 19.1%, and 18.1%,
respectively. Under the same conditions, the hybridASR system demonstrates
33.5%, 20.9%, and 18.6% WER.Comment: Accepted by SPECOM 202
KNOWLEDGE TRANSFER FOR RUSSIAN CONVERSATIONAL TELEPHONE AUTOMATIC SPEECH RECOGNITION
This paper describes the method of knowledge transfer between the ensemble of neural network acoustic models and student-network. This method is used to reduce computational costs and improve the quality of the speech recognition system. The experiments consider two variants of generation of class labels from the ensemble of models: interpolation with alignment, and the posteriori probabilities. Also, the quality of models was studied in relation with the smoothing coefficient. This coefficient was built into the output log-linear classifier of the neural network (softmax layer) and was used both in the ensemble and in the student-network. Additionally, the initial and final learning rates were analyzed. We were successful in relationship establishing between the usage of the smoothing coefficient for generation of the posteriori probabilities and the parameters of the learning rate. Finally, the application of the knowledge transfer for the automatic recognition of Russian conversational telephone speech gave the possibility to reduce the WER (Word Error Rate) by 2.49%, in comparison with the model trained on alignment from the ensemble of neural networks
Разновидности глубоких искусственных нейронных сетей для систем распознавания речи
This paper presents a survey of basic methods for acoustic and language model development based on artificial neural networks for automatic speech recognition systems. The hybrid and tandem approaches for combination of Hidden Markov Models and artificial neural networks for acoustic modelling are given. The creation of language models using feedforward and recurrent neural networks is described. The survey of researches, conducted in this field, shows that application of artificial neural networks at the stages of both acoustic and language modeling allows decreasing word error rate.В статье представлен аналитический обзор основных разновидностей акустических и языковых моделей на основе искусственных нейронных сетей для систем автоматического распознавания речи. Рассмотрены гибридный и тандемный под-ходы объединения скрытых марковских моделей и искусственных нейронных сетей для акустического моделирования, описано построение языковых моделей с применением сетей прямого распространения и рекуррентных нейросетей. Обзор исследований в данной области показывает, что применение искусственных нейронных сетей как на этапе акустического, так и на этапе языкового моделирования позволяет снизить ошибку распознавания слов
Computational Intelligence and Human- Computer Interaction: Modern Methods and Applications
The present book contains all of the articles that were accepted and published in the Special Issue of MDPI’s journal Mathematics titled "Computational Intelligence and Human–Computer Interaction: Modern Methods and Applications". This Special Issue covered a wide range of topics connected to the theory and application of different computational intelligence techniques to the domain of human–computer interaction, such as automatic speech recognition, speech processing and analysis, virtual reality, emotion-aware applications, digital storytelling, natural language processing, smart cars and devices, and online learning. We hope that this book will be interesting and useful for those working in various areas of artificial intelligence, human–computer interaction, and software engineering as well as for those who are interested in how these domains are connected in real-life situations
Recognition and cortical haemodynamics of vocal emotions-an fNIRS perspective
Normal-hearing listeners rely heavily on variations in the fundamental frequency (F0) of speech to identify vocal emotions. Without reliable F0 cues, as is the case for cochlear implant users, listeners’ ability to extract emotional meaning from speech is reduced. This thesis describes the development of an objective measure of vocal emotion recognition. The program of three experiments investigates: 1) NH listeners’ abilities to use F0, intensity, and speech-rate cues to recognise emotions; 2) cortical activity associated with individual vocal emotions assessed using functional near-infrared spectroscopy (fNIRS); 3) cortical activity evoked by vocal emotions in natural speech and in speech with uninformative F0 using fNIRS
Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information
This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech
Learning representations for speech recognition using artificial neural networks
Learning representations is a central challenge in machine learning. For speech
recognition, we are interested in learning robust representations that are stable
across different acoustic environments, recording equipment and irrelevant inter–
and intra– speaker variabilities. This thesis is concerned with representation
learning for acoustic model adaptation to speakers and environments, construction
of acoustic models in low-resource settings, and learning representations from
multiple acoustic channels. The investigations are primarily focused on the hybrid
approach to acoustic modelling based on hidden Markov models and artificial
neural networks (ANN).
The first contribution concerns acoustic model adaptation. This comprises
two new adaptation transforms operating in ANN parameters space. Both operate
at the level of activation functions and treat a trained ANN acoustic model as
a canonical set of fixed-basis functions, from which one can later derive variants
tailored to the specific distribution present in adaptation data. The first technique,
termed Learning Hidden Unit Contributions (LHUC), depends on learning
distribution-dependent linear combination coefficients for hidden units. This
technique is then extended to altering groups of hidden units with parametric and
differentiable pooling operators. We found the proposed adaptation techniques
pose many desirable properties: they are relatively low-dimensional, do not overfit
and can work in both a supervised and an unsupervised manner. For LHUC we
also present extensions to speaker adaptive training and environment factorisation.
On average, depending on the characteristics of the test set, 5-25% relative
word error rate (WERR) reductions are obtained in an unsupervised two-pass
adaptation setting.
The second contribution concerns building acoustic models in low-resource
data scenarios. In particular, we are concerned with insufficient amounts of
transcribed acoustic material for estimating acoustic models in the target language
– thus assuming resources like lexicons or texts to estimate language models
are available. First we proposed an ANN with a structured output layer
which models both context–dependent and context–independent speech units,
with the context-independent predictions used at runtime to aid the prediction
of context-dependent states. We also propose to perform multi-task adaptation
with a structured output layer. We obtain consistent WERR reductions up to
6.4% in low-resource speaker-independent acoustic modelling. Adapting those
models in a multi-task manner with LHUC decreases WERRs by an additional
13.6%, compared to 12.7% for non multi-task LHUC. We then demonstrate that
one can build better acoustic models with unsupervised multi– and cross– lingual
initialisation and find that pre-training is a largely language-independent. Up to
14.4% WERR reductions are observed, depending on the amount of the available
transcribed acoustic data in the target language.
The third contribution concerns building acoustic models from multi-channel
acoustic data. For this purpose we investigate various ways of integrating and
learning multi-channel representations. In particular, we investigate channel concatenation
and the applicability of convolutional layers for this purpose. We
propose a multi-channel convolutional layer with cross-channel pooling, which
can be seen as a data-driven non-parametric auditory attention mechanism. We
find that for unconstrained microphone arrays, our approach is able to match the
performance of the comparable models trained on beamform-enhanced signals
Planning in Cold War Europe
This volume aims at enlarging our understanding of planning, engineering and more generally regulation ideas by looking at possible forms of mutual interest, circulation of idea and models in a “pan’european” perspective. Contributions will emphasize the role played by actors from both sides of Europe at the micro- as well as macro-level and will highlight the role played by international organizations as fora and platforms where ideas and know-how were exchanged. The volume will also look at development projects in the developing countries as a field where European conceptions of planned development were competing but also converging especially in the eyes of the countries who benefited from these policies
Assessing speech production in english as a foreign language: an analysis of international proficiency tests and guidelines
Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro de Comunicação e Expressão, Programa de Pós-Graduação em Letras/Inglês e Literatura Correspondente, Florianópolis, 2010Este estudo investigou os componentes da habilidade oral que são tratados nas escalas orais de dois testes de proficiência em inglês como lingual estrangeira (TOEFL e IELTS) e duas diretrizes para orientações em ensino, aprendizagem e testagem (ACTFL e CEFR). Para alcançar o objetivo do estudo, primeiramente, cada escala de produção oral foi analisada através da lista de verificação e instrumento de avaliação da habilidade comunicativa de linguagem proposta por Bachman (1995). Esta análise revelou o grau de envolvimento de cada componente da habilidade omunicativa de linguagem em todas as escalas de produção oral. As escalas de produção oral foram analisadas pelo framework para descrição do construto oral proposto por Fulcher (2003). As análises demonstraram que os componentes da habilidade das escalas do TOEFL e do IELTS são similares enquanto aquelas do ACTFL e CEFR são também muito comparáveis. Além disso, a escala oral do IELTS é mais comparável às escalas orais do ACTFL e CEFR do que à escala oral do TOEFL. Os principais resultados deste estudo podem contribuir para o melhor entendimento, por professores e estudantes, dos componentes da habilidade oral que estão presente em exames internacionais de proficiência em inglês e em diretrizes internacionais para orientações em ensino, aprendizagem e testagem
Planning in Cold War Europe
This volume aims at enlarging our understanding of planning, engineering and more generally regulation ideas by looking at possible forms of mutual interest, circulation of idea and models in a “pan’european” perspective. Contributions will emphasize the role played by actors from both sides of Europe at the micro- as well as macro-level and will highlight the role played by international organizations as fora and platforms where ideas and know-how were exchanged. The volume will also look at development projects in the developing countries as a field where European conceptions of planned development were competing but also converging especially in the eyes of the countries who benefited from these policies
- …