Search CORE

17,663 research outputs found

An audio-visual corpus for multimodal automatic speech recognition

Author
Publication venue: Springer
Publication date: 07/01/2017
Field of study

Improving the translation environment for professional translators

Author: Augustinus Liesbeth
Bulté Bram
Buysschaert Joost
Coppers Sven
Daems Joke
Heyman Geert
Hoste Veronique
Lefever Els
Luyten Kris
Macken Lieve
Moens Marie-Francine
Pelemans Joris
Rigouts Terryn Ayla
Steurs Frieda
Tezcan Arda
Van den Bergh Jan
van der Lek-Ciudin Iulianna
Van Eynde Frank
Vanallemeersch Tom
Vandeghinste Vincent
Verwimp Lyan
Wambacq Patrick
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

Multidisciplinary Digital Publishing Institute

Ghent University Academic Bibliography

Multimodal database of emotional speech, video and gestures

Author: A Greco
A Kendon
A Savran
AT Lopes
B Pease
BD Gelder
D Kamińska
F Noroozi
G Goswami
I Lüsi
J Russell
K Zhang
LA Camras
P Ekman
P Pławiak
R Jenke
R Min
R Plutchik
S Jerritta
X Zhang
Publication venue
Publication date: 01/01/2018
Field of study

People express emotions through different modalities. Integration of verbal and non-verbal communication channels creates a system in which the message is easier to understand. Expanding the focus to several expression forms can facilitate research on emotion recognition as well as human-machine interaction. In this article, the authors present a Polish emotional database composed of three modalities: facial expressions, body movement and gestures, and speech. The corpora contains recordings registered in studio conditions, acted out by 16 professional actors (8 male and 8 female). The data is labeled with six basic emotions categories, according to Ekman’s emotion categories. To check the quality of performance, all recordings are evaluated by experts and volunteers. The database is available to academic community and might be useful in the study on audio-visual emotion recognition

Loughborough University Institutional Repository

Crossref

Definition of Requirements for Accessing Multilingual Information and Opinions

Author: Derkacz Jan
Grega Michał
Koźbiał Arian
Leszczuk Mikołaj
Smaıli Kamel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/09/2016
Field of study

With the development of the Internet and satellite television, access to thousands of programs and messages in different languages became widespread. Unfortunately, even well educated people do not speak sufficiently in more than two or three foreign languages, while most know only one, and this significantly limits the access to this information. In this paper, we define requirements for an automated system for Accessing Multilingual Information and opinionS (AMIS) that will help in the understanding of multimedia content transmitted in different languages, with simultaneous comparison to counterparts in their native language user. The concept of understanding we use will provide access to any information, regardless of the language in which it is presented. We believe that the AMIS project can have a immense and positive impact on the integration and awareness of society in social and cultural terms

Institutional Repository of the Islamic University of Gaza

INRIA a CCSD electronic archive server

Definition of Requirements for Accessing Multilingual Information and Opinions

Author: Derkacz Jan
Grega Michał
Koźbiał Arian
Leszczuk Mikołaj
Smaıli Kamel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Institutional Repository of the Islamic University of Gaza

Audio-visual speech processing system for Polish applicable to human-computer interaction

Author: Jadczyk Tomasz
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 19/02/2018
Field of study

This paper describes audio-visual speech recognition system for Polish language and a set of performance tests under various acoustic conditions. We first present the overall structure of AVASR systems with three main areas: audio features extraction, visual features extraction and subsequently, audiovisual speech integration. We present MFCC features for audio stream with standard HMM modeling technique, then we describe appearance and shape based visual features. Subsequently we present two feature integration techniques, feature concatenation and model fusion. We also discuss the results of a set of experiments conducted to select best system setup for Polish, under noisy audio conditions. Experiments are simulating human-computer interaction in computer control case with voice commands in difficult audio environments. With Active Appearance Model (AAM) and multistream Hidden Markov Model (HMM) we can improve system accuracy by reducing Word Error Rate for more than 30%, comparing to audio-only speech recognition, when Signal-to-Noise Ratio goes down to 0dB

Computer Science Journal (AGH University of Science and Technology, Krakow)

UCSY-SC1: A Myanmar speech corpus for automatic speech recognition

Author: Mon Aye Nyein
Pa Win Pa
Thu Ye Kyaw
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/08/2019
Field of study

This paper introduces a speech corpus which is developed for Myanmar Automatic Speech Recognition (ASR) research. Automatic Speech Recognition (ASR) research has been conducted by the researchers around the world to improve their language technologies. Speech corpora are important in developing the ASR and the creation of the corpora is necessary especially for low-resourced languages. Myanmar language can be regarded as a low-resourced language because of lack of pre-created resources for speech processing research. In this work, a speech corpus named UCSY-SC1 (University of Computer Studies Yangon - Speech Corpus1) is created for Myanmar ASR research. The corpus consists of two types of domain: news and daily conversations. The total size of the speech corpus is over 42 hrs. There are 25 hrs of web news and 17 hrs of conversational recorded data.The corpus was collected from 177 females and 84 males for the news data and 42 females and 4 males for conversational domain. This corpus was used as training data for developing Myanmar ASR. Three different types of acoustic models such as Gaussian Mixture Model (GMM) - Hidden Markov Model (HMM), Deep Neural Network (DNN), and Convolutional Neural Network (CNN) models were built and compared their results. Experiments were conducted on different data sizes and evaluation is done by two test sets: TestSet1, web news and TestSet2, recorded conversational data. It showed that the performance of Myanmar ASRs using this corpus gave satisfiable results on both test sets. The Myanmar ASR using this corpus leading to word error rates of 15.61% on TestSet1 and 24.43% on TestSet2

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

TectoMT – a deep-linguistic core of the combined Chimera MT system

Author: Bojar Ondřej
Hajič Jan
Popel Martin
Rosa Rudolf
Sudarikov Roman
Publication venue
Publication date: 01/01/2016
Field of study

Chimera is a machine translation system that combines the TectoMT deep-linguistic core with phrase-based MT system Moses. For English–Czech pair it also uses the Depfix post-correction system. All the components run on Unix/Linux platform and are open source (available from Perl repository CPAN and the LINDAT/CLARIN repository). The main website is https://ufal.mff.cuni.cz/tectomt. The development is currently supported by the QTLeap 7th FP project (http://qtleap.eu)

Biblio at Institute of Formal and Applied Linguistics

Speaker diarization and speech recognition in the semi-automatization of audio description : an exploratory study on future possibilities?

Author: Delgado Flores Héctor
Matamala Anna
Serrano García Javier
Publication venue: 'Universidade Federal de Santa Catarina (UFSC)'
Publication date: 01/01/2015
Field of study

This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarizedEste artículo presenta una visión panorámica de los componentes tecnológicos usados en el proceso de audiodescripción y propone un nuevo escenario en el que se aplicarían el reconocimiento de habla, la traducción automática y la síntesis de habla, con su correspondiente revisión humana, para incrementar la cantidad de audiodescripciones disponibles. El artículo describe un proceso en el que la diarización y el reconocimiento de habla permiten obtener una transcripción semiautomática de la audiodescripción. El artículo presenta detalladamente el proceso técnico así como un resumen de los resultados experimentales.- In a second languag

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

Diposit Digital de Documents de la UAB

DIALNET