Search CORE

43 research outputs found

Design and evaluation of acceleration strategies for speeding up the development of dialog applications

Author: Agah
Bohus
Chung
D’Haro
Javier Ferreiros
José Manuel Pardo
Jung
Luis Fernando D’Haro
McTear
Pargellis
Ricardo de Córdoba
Rubén San-Segundo
Tsai
Wang
Wolters
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

In this paper, we describe a complete development platform that features different innovative acceleration strategies, not included in any other current platform, that simplify and speed up the definition of the different elements required to design a spoken dialog service. The proposed accelerations are mainly based on using the information from the backend database schema and contents, as well as cumulative information produced throughout the different steps in the design. Thanks to these accelerations, the interaction between the designer and the platform is improved, and in most cases the design is reduced to simple confirmations of the “proposals” that the platform dynamically provides at each step. In addition, the platform provides several other accelerations such as configurable templates that can be used to define the different tasks in the service or the dialogs to obtain or show information to the user, automatic proposals for the best way to request slot contents from the user (i.e. using mixed-initiative forms or directed forms), an assistant that offers the set of more probable actions required to complete the definition of the different tasks in the application, or another assistant for solving specific modality details such as confirmations of user answers or how to present them the lists of retrieved results after querying the backend database. Additionally, the platform also allows the creation of speech grammars and prompts, database access functions, and the possibility of using mixed initiative and over-answering dialogs. In the paper we also describe in detail each assistant in the platform, emphasizing the different kind of methodologies followed to facilitate the design process at each one. Finally, we describe the results obtained in both a subjective and an objective evaluation with different designers that confirm the viability, usefulness, and functionality of the proposed accelerations. Thanks to the accelerations, the design time is reduced in more than 56% and the number of keystrokes by 84%

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Application of backend database contents and structure to the design of spoken dialog services

Author: D’Haro
Gorin
Javier Ferreiros
José Manuel Pardo
Juan Manuel Montero
Jung
Luis Fernando D’Haro
López-Cózar
McTear
Paternò
Ricardo de Córdoba
Wang
Wolters
Zajicek
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

Current development platforms for designing spoken dialog services feature different kinds of strategies to help designers build, test, and deploy their applications. In general, these platforms are made up of several assistants that handle the different design stages (e.g. definition of the dialog flow, prompt and grammar definition, database connection, or to debug and test the running of the application). In spite of all the advances in this area, in general the process of designing spoken-based dialog services is a time consuming task that needs to be accelerated. In this paper we describe a complete development platform that reduces the design time by using different types of acceleration strategies based on using information from the data model structure and database contents, as well as cumulative information obtained throughout the successive steps in the design. Thanks to these accelerations, the interaction with the platform is simplified and the design is reduced, in most cases, to simple confirmations to the “proposals” that the platform automatically provides at each stage. Different kinds of proposals are available to complete the application flow such as the possibility of selecting which information slots should be requested to the user together, predefined templates for common dialogs, the most probable actions that make up each state defined in the flow, different solutions to solve specific speech-modality problems such as the presentation of the lists of retrieved results after querying the backend database. The platform also includes accelerations for creating speech grammars and prompts, and the SQL queries for accessing the database at runtime. Finally, we will describe the setup and results obtained in a simultaneous summative, subjective and objective evaluations with different designers used to test the usability of the proposed accelerations as well as their contribution to reducing the design time and interaction

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Unsupervised crosslingual adaptation of tokenisers for spoken language recognition

Author: Raymond W.M. Ng
Mauro Nicolao
Thomas Hain
Ambikairajah
Anderson
BenZeghiba
BenZeghiba
Caraballo
Corboda
Davis
Dehak
D’Haro
D’Haro
Fék
Ferrer
Gauvain
Gibson
Glembek
Hazen
Hermansky
Joachims
Knill
Li
Li
Lööf
Ma
Muthusamy
Navrátil
Ng
Ng
Richardson
Schultz
Schwarz
Singer
Suzuki
Torres-Carrasquillo
Torres-Carrasquillo
Veselý
Vu
Xue
Zissman
Zissman
Publication venue: 'Elsevier BV'
Publication date: 01/11/2017
Field of study

Phone tokenisers are used in spoken language recognition (SLR) to obtain elementary phonetic information. We present a study on the use of deep neural network tokenisers. Unsupervised crosslingual adaptation was performed to adapt the baseline tokeniser trained on English conversational telephone speech data to different languages. Two training and adaptation approaches, namely cross-entropy adaptation and state-level minimum Bayes risk adaptation, were tested in a bottleneck i-vector and a phonotactic SLR system. The SLR systems using the tokenisers adapted to different languages were combined using score fusion, giving 7-18% reduction in minimum detection cost function (minDCF) compared with the baseline configurations without adapted tokenisers. Analysis of results showed that the ensemble tokenisers gave diverse representation of phonemes, thus bringing complementary effects when SLR systems with different tokenisers were combined. SLR performance was also shown to be related to the quality of the adapted tokenisers

Crossref

Biblioteca Digital de la Comunidad de Madrid

White Rose Research Online

Unsupervised crosslingual adaptation of tokenisers for spoken language recognition

Author: Ambikairajah
Anderson
BenZeghiba
BenZeghiba
Caraballo
Corboda
Davis
Dehak
D’Haro
D’Haro
Ferrer
Fék
Gauvain
Gibson
Glembek
Hazen
Hermansky
Joachims
Knill
Li
Li
Lööf
Ma
Mauro Nicolao
Muthusamy
Navrátil
Ng
Ng
Raymond W.M. Ng
Richardson
Schultz
Schwarz
Singer
Suzuki
Thomas Hain
Torres-Carrasquillo
Torres-Carrasquillo
Veselý
Vu
Xue
Zissman
Zissman
Publication venue: 'Elsevier BV'
Publication date: 01/11/2017
Field of study

Crossref

White Rose Research Online

Design, development and field evaluation of a Spanish into sign language translation system

Author: A. García
D. Sánchez
DI Fels
E Efthimiou
F Casacuberta
F. Fernández
H Hermansky
J Och
J Wong
J. M. Montero
JB Mariño
JL Gauvain
L. F. D’Haro
R San-Segundo
R San-Segundo
R. Córdoba
R. San-Segundo
S Möller
V. López-Ludeña
V. Sama
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

This paper describes the design, development and field evaluation of a machine translation system from Spanish to Spanish Sign Language (LSE: Lengua de Signos Española). The developed system focuses on helping Deaf people when they want to renew their Driver’s License. The system is made up of a speech recognizer (for decoding the spoken utterance into a word sequence), a natural language translator (for converting a word sequence into a sequence of signs belonging to the sign language), and a 3D avatar animation module (for playing back the signs). For the natural language translator, three technological approaches have been implemented and evaluated: an example-based strategy, a rule-based translation method and a statistical translator. For the final version, the implemented language translator combines all the alternatives into a hierarchical structure. This paper includes a detailed description of the field evaluation. This evaluation was carried out in the Local Traffic Office in Toledo involving real government employees and Deaf people. The evaluation includes objective measurements from the system and subjective information from questionnaires. The paper details the main problems found and a discussion on how to solve them (some of them specific for LSE)

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Speech to sign language translation system for Spanish

Author: Abdel-Fattah
Casacuberta
Christopoulos
Cole
Engberg-Pedersen
F. Fernández
Granström
Gustafson
J. Ferreiros
J. Macías-Guarasa
J.M. Lucas
J.M. Montero
J.M. Pardo
Koehn
L.F. D’Haro
Masataka
Och
Prillwitz
Pyers
R. Barra
R. Córdoba
R. San-Segundo
Reyes
Sylvie
Zens
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

AT&T Labs Research,

Author: Florham Park
Luis Fernando D’haro
Michael Johnston
Publication venue
Publication date: 01/04/2008
Field of study

research. att.co

CiteSeerX

LOW-RESOURCE LANGUAGE RECOGNITION USING A FUSION OF PHONEME POSTERIORGRAM COUNTS, ACOUSTIC AND GLOTTAL-BASED I-VECTORS

Author: J. M. Pardo
L. F. D’haro
M. A. Caraballo
R. Cordoba
Publication venue
Publication date: 27/11/2013
Field of study

This paper presents a description of our system for the Albayzin 2012 LRE competition. One of the main characteristics of this evaluation was the reduced number of available files for training the system, especially for the empty condition where no training data set was provided but only a development set. In addition, the whole database was created from online videos and around one third of the training data was labeled as noisy files. Our primary system was the fusion of three different i-vector based systems: one acoustic system based on MFCCs, a phonotactic system using trigrams of phone-posteriorgram counts, and another acoustic system based on RPLPs that improved robustness against noise. A contrastive system that included new features based on the glottal source was also presented. Official and postevaluation results for all the conditions using the proposed metrics for the evaluation and the Cavg metric are presented in the paper. Index Terms—LID system, noise robustness, scarce data, posteriorgram counts, i-vectors 1

CiteSeerX

Crossref

Deep AM-FM: Toolkit for Automatic Dialogue Evaluation

Author: H Palangi
LF D’Haro
P Bojanowski
RE Banchs
S Hochreiter
TK Landauer
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 17/04/2020
Field of study

Crossref

ScholarBank@NUS