Search CORE

4,228 research outputs found

Automated Testing of Speech-to-Speech Machine Translation in Telecom Networks

Author: Tirronen Silja
Publication venue: Aalto-yliopisto
Publication date: 01/01/2011
Field of study

Globalisoituvassa maailmassa kyky kommunikoida kielimuurien yli käy yhä tärkeämmäksi. Kielten opiskelu on työlästä ja siksi halutaan kehittää automaattisia konekäännösjärjestelmiä. Ericsson on kehittänyt prototyypin nimeltä Real-Time Interpretation System (RTIS), joka toimii mobiiliverkossa ja kääntää matkailuun liittyviä fraaseja puhemuodossa kahden kielen välillä. Nykyisten konekäännösjärjestelmien suorituskyky on suhteellisen huono ja siksi testauksella on suuri merkitys järjestelmien suunnittelussa. Testauksen tarkoituksena on varmistaa, että järjestelmä säilyttää käännösekvivalenssin sekä puhekäännösjärjestelmän tapauksessa myös riittävän puheenlaadun. Luotettavimmin testaus voidaan suorittaa ihmisten antamiin arviointeihin perustuen, mutta tällaisen testauksen kustannukset ovat suuria ja tulokset subjektiivisia. Tässä työssä suunniteltiin ja analysoitiin automatisoitu testiympäristö Real-Time Interpretation System -käännösprototyypille. Tavoitteina oli tutkia, voidaanko testaus suorittaa automatisoidusti ja pystytäänkö todellinen, käyttäjän havaitsema käännösten laatu mittaamaan automatisoidun testauksen keinoin. Tulokset osoittavat että mobiiliverkoissa puheenlaadun testaukseen käytetyt menetelmät eivät ole optimaalisesti sovellettavissa konekäännösten testaukseen. Nykytuntemuksen mukaan ihmisten suorittama arviointi on ainoa luotettava tapa mitata käännösekvivalenssia ja puheen ymmärrettävyyttä. Konekäännösten testauksen automatisointi vaatii lisää tutkimusta, jota ennen subjektiivinen arviointi tulisi säilyttää ensisijaisena testausmenetelmänä RTIS-testauksessa.In the globalizing world, the ability to communicate over language barriers is increasingly important. Learning languages is laborious, which is why there is a strong desire to develop automatic machine translation applications. Ericsson has developed a speech-to-speech translation prototype called the Real-Time Interpretation System (RTIS). The service runs in a mobile network and translates travel phrases between two languages in speech format. The state-of-the-art machine translation systems suffer from a relatively poor performance and therefore evaluation plays a big role in machine translation development. The purpose of evaluation is to ensure the system preserves the translational equivalence, and in case of a speech-to-speech system, the speech quality. The evaluation is most reliably done by human judges. However, human-conducted evaluation is costly and subjective. In this thesis, a test environment for Ericsson Real-Time Interpretation System prototype is designed and analyzed. The goals are to investigate if the RTIS verification can be conducted automatically, and if the test environment can truthfully measure the end-to-end performance of the system. The results conclude that methods used in end-to-end speech quality verification in mobile networks can not be optimally adapted for machine translation evaluation. With current knowledge, human-conducted evaluation is the only method that can truthfully measure translational equivalence and the speech intelligibility. Automating machine translation evaluation needs further research, until which human-conducted evaluation should remain the preferred method in RTIS verification

Aaltodoc Publication Archive

Extracting Information from Spoken User Input:A Machine Learning Approach

Author: Lendvai P.K.
Publication venue: [n.n.]
Publication date: 01/01/2004
Field of study

We propose a module that performs automatic analysis of user input in spoken dialogue systems using machine learning algorithms. The input to the module is material received from the speech recogniser and the dialogue manager of the spoken dialogue system, the output is a four-level pragmatic-semantic representation of the user utterance. Our investigation shows that when the four interpretation levels are combined in a complex machine learning task, the performance of the module is significantly better than the score of an informed baseline strategy. However, via a systematic, automatised search for the optimal subtask combinations we can gain substantial improvement produced by both classifiers for all four interpretation subtasks. A case study is conducted on dialogues between an automatised, experimental system that gives information on the phone about train connections in the Netherlands, and its users who speak in Dutch. We find that drawing on unsophisticated, potentially noisy features that characterise the dialogue situation, and by performing automatic optimisation of the formulated machine learning task it is possible to extract sophisticated information of practical pragmatic-semantic value from spoken user input with robust performance. This means that our module can with a good score interpret whether the user of the system is giving slot-filling information, and for which query slots (e.g., departure station, departure time, etc.), whether the user gave a positive or a negative answer to the system, or whether the user signals that there are problems in the interaction.

Tilburg University Repository

Acquiring and Maintaining Knowledge by Natural Multimodal Dialog

Author: Holzapfel Hartwig
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

KITopen

Fuzzy GMM-based Confidence Measure Towards Keywords Spotting Application

Author: Abida Mohamed Kacem
Publication venue: 'University of Waterloo'
Publication date: 01/01/2007
Field of study

The increasing need for more natural human machine interfaces has generated intensive research work directed toward designing and implementing natural speech enabled systems. The Spectrum of speech recognition applications ranges from understanding simple commands to getting all the information in the speech signal such as words, meaning and emotional state of the user. Because it is very hard to constrain a speaker when expressing a voice-based request, speech recognition systems have to be able to handle (by filtering out) out of vocabulary words in the users speech utterance, and only extract the necessary information (keywords) related to the application to deal correctly with the user query. In this thesis, we investigate an approach that can be deployed in keyword spotting systems. We propose a confidence measure feedback module that provides confidence values to be compared against existing Automatic Speech Recognizer word confidences. The feedback module mainly consists of a soft computing tool-based system using fuzzy Gaussian mixture models to identify all English phonemes. Testing has been carried out on the JULIUS system and the preliminary results show that our feedback module outperforms JULIUS confidence measures for both the correct spotted words and the falsely mapped ones. The results obtained could be refined even further using other type of confidence measure and the whole system could be used for a Natural Language Understanding based module for speech understanding applications

University of Waterloo's Institutional Repository

Toward an affect-sensitive multimodal human-computer interaction

Author: Pantic M
Rothkrantz L
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

The ability to recognize affective states of a person... This paper argues that next-generation human-computer interaction (HCI) designs need to include the essence of emotional intelligence -- the ability to recognize a user's affective states -- in order to become more human-like, more effective, and more efficient. Affective arousal modulates all nonverbal communicative cues (facial expressions, body movements, and vocal and physiological reactions). In a face-to-face interaction, humans detect and interpret those interactive signals of their communicator with little or no effort. Yet design and development of an automated system that accomplishes these tasks is rather difficult. This paper surveys the past work in solving these problems by a computer and provides a set of recommendations for developing the first part of an intelligent multimodal HCI -- an automatic personalized analyzer of a user's nonverbal affective feedback

CiteSeerX

Crossref

TU Delft Repository

Spiral - Imperial College Digital Repository

PLPrepare: A Grammar Checker for Challenging Cases

Author: Hoyos Jacob
Publication venue: Digital Commons @ East Tennessee State University
Publication date: 01/05/2021
Field of study

This study investigates one of the Polish language’s most arbitrary cases: the genitive masculine inanimate singular. It collects and ranks several guidelines to help language learners discern its proper usage and also introduces a framework to provide detailed feedback regarding arbitrary cases. The study tests this framework by implementing and evaluating a hybrid grammar checker called PLPrepare. PLPrepare performs similarly to other grammar checkers and is able to detect genitive case usages and provide feedback based on a number of error classifications

East Tennessee State University