13 research outputs found
Numerische Simulation der thermisch bedingten Werkstück-Abweichungen beim Drehen mit unterschiedlichen Kühlschmiermethoden
Zerspanung, Drehen, Kühlschmierung, Werkstücktemperatur, thermisch bedingte Abweichungen, FE-Modellierung, InversaufgabenMagdeburg, Univ., Fak. für Maschinenbau, Diss., 2003von Viktor Sukayl
Natural Language Technology to Ensure the Safety of Speech Information
This paper is focused on Natural Language Processing (NLP) and speech area, describes the most prominent approaches and techniques, provides requirements to datasets for text and speech model training, compares major toolkits and techniques, and describes trends for NLP and speech domain
Analysis of Automatic Speech Recognition Methods
This paper outlines structures of different automatic speech recognition systems, hybrid and end-to-end, pros and cons for each of them, including the comparison of training data and computational resources requirements. Three main approaches to speech recognition are considered: hybrid Hidden Markov Model – Deep Neural Network, end-to-end Connectionist Temporal Classification and Sequence-to-Sequence. The Listen, Attend, and Spell approach is chosen as an example for the Sequence-to-Sequence model
Automated Pipeline for Training Dataset Creation from Unlabeled Audios for Automatic Speech Recognition
In the paper, we present a software pipeline for speech recognition to automate the creation of training datasets, based on desired unlabeled audios, for low resource languages and domain-specific area. Considering the commoditizing of speech recognition, more teams build domain-specific models as well as models for local languages. At the same time, lack of training datasets for low to middle resource languages significantly decreases possibilities to exploit last achievements and frameworks in the Speech Recognition area and limits the wide range of software engineers to work on speech recognition problems. This problem is even more critical for domain-specific datasets. The pipeline was tested for building Ukrainian language recognition and confirmed that the created design is adaptable to different data source formats and expandable to integrate with existing frameworks
Prototyping Methodology of End-to-End Speech Analytics Software
This paper presents the prototype of end-to-end speech recognition, storage, and postprocessing tasks to build speech analytics, real-time agent augmentation, and other speechrelated products. Moving ASR models from the dev environment into production requires both researcher and architectural knowledge, which slows down and limits the possibility of companies benefiting from speech recognition and NLP advances for fundamental business operations. This paper proposes a fast and flexible prototype that can be easily implemented and used to serve ASR/NLP-trained models to solve business problems. Various software solutions’ compatibility problems were solved during the experimental setup assembly, and a working prototype was built and tested. An architectural diagram of the solution was also shown. Performance, limitations, and challenges of implementation are also described
Transferability Evaluation of Speech Emotion Recognition Between Different Languages
Advances in automated speech recognition significantly accelerated the automation of contact centers, thus creating a need for robust Speech Emotion Recognition (SER) as an integral part of customer net promoter score measuring. However, to train a specific language, a specifically labeled dataset of emotions should be available, a significant limitation. Emotion detection datasets cover only English, German, Mandarin, and Indian. We have shown by results difference between predicting two and four emotions, which leads us to narrow down datasets to particular practical use cases rather than train the model on the whole given dataset. We identified that if emotion transfers good enough from source language to target language, it reflects the same quality of transferability in vice verse direction between languages. Hence engineers can not expect the same transferability in the mirror direction. Chinese language and datasets are the hardest to transfer to other languages for transferability purposes. English dataset transferability is one of the lowest, hence for a production environment, engineers cannot rely on a training model on English for their language. This paper conducted more than 140 experiments for seven languages to evaluate and show the transferability of speech recognition models trained on different languages to have a clear framework which starting dataset to use to achieve good accuracy for practical implementation. The novelty of this study lies in the fact that models for different languages have not yet been compared with each other