Search CORE

36 research outputs found

Generación de una voz sintética en Castellano basada en HSMM para la Evaluación Albayzín 2008: conversión texto a voz

Author: Barra Chicote Roberto
King Simon
Lutfi Syaheerah L.
Macías Guarasa Javier
Montero Martínez Juan Manuel
Yamagishi J.
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/01/2008
Field of study

Este artículo describe el proceso de generación de una voz en castellano utilizando el corpus UPC ESMA de UPC proporcionado por la Evaluación Albayzín 2008: Conversión Texto a Voz. Se ha implementado una voz basada en selección de unidades mediante el paquete Multisyn de Festival y otra basada en Hidden Semi-Markov Models (HSMM) mediante HTS. Tras una breve evaluación de la calidad de ambas voces, se detallan las características principales de la voz basada en HSMM, sistema final presentado a la evaluación

Archivo Digital UPM

Proposing a speech to gesture translation architecture for Spanish deaf people.

Author: Bertenstam
Cassell
Christopoulos
Cole
Cox
Granström
Gratch
Gustafson
Hansen
Huenerfauth
J. Ferreiros
J. Macías-Guarasa
J.M. Montero
J.M. Pardo
Johnson
Kyle
Lundeberg
Notoya
Penn
R. Córdoba
R. San-Segundo
Ward
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

This article describes an architecture for translating speech into Spanish Sign Language (SSL). The architecture proposed is made up of four modules: speech recognizer, semantic analysis, gesture sequence generation and gesture playing. For the speech recognizer and the semantic analysis modules, we use software developed by IBM and CSLR (Center for Spoken Language Research at University of Colorado), respectively. Gesture sequence generation and gesture animation are the modules on which we have focused our main effort. Gesture sequence generation uses semantic concepts (obtained from the semantic analysis) associating them with several SSL gestures. This association is carried out based on a number of generation rules. For gesture animation, we have developed an animated agent (virtual representation of a human person) and a strategy for reducing the effort in gesture animation. This strategy consists of making the system automatically generate all agent positions necessary for the gesture animation. In this process, the system uses a few main agent positions (two or three per second) and some interpolation strategies, both issues previously generated by the service developer (the person who adapts the architecture proposed in this paper to a specific domain). Related to this module, we propose a distance between agent positions and a measure of gesture complexity. This measure can be used to analyze the gesture perception versus its complexity. With the architecture proposed, we are not trying to build a domain independent translator but a system able to translate speech utterances into gesture sequences in a restricted domain: railway, flights or weather information

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM

Real Field Deployment of a Smart Fiber Optic Surveillance System for Pipeline Integrity Threat Detection: Architectural Issues and Blind Field Test Results

Author: Ahlen C.H.
Corredera Pedro
De Pauw G.
De Smet F.
Fidalgo Martins Hugo
González Herráez Miguel
Macías Guarasa Javier
Martín López Sonia
Pastor Graells Juan
Postvoll W.
Tejedor J.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

This paper presents an on-line augmented surveillance system that aims to real time monitoring of activities along a pipeline. The system is deployed in a fully realistic scenario and exposed to real activities carried out in unknown places at unknown times within a given test time interval (socalled blind field tests). We describe the system architecture that includes specific modules to deal with the fact that continuous on-line monitoring needs to be carried out, while addressing the need of limiting the false alarms at reasonable rates. To the best or our knowledge, this is the first published work in which a pipeline integrity threat detection system is deployed in a realistic scenario (using a fiber optic along an active gas pipeline) and is thoroughly and objectively evaluated in realistic blind conditions. The system integrates two operation modes: The machine+activity identification mode identifies the machine that is carrying out a certain activity along the pipeline, and the threat detection mode directly identifies if the activity along the pipeline is a threat or not. The blind field tests are carried out in two different pipeline sections: The first section corresponds to the case where the sensor is close to the sensed area, while the second one places the sensed area about 35 km far from the sensor. Results of the machine+activity identification mode showed an average machine+activity classification rate of 46:6%. For the threat detection mode, 8 out of 10 threats were correctly detected, with only 1 false alarm appearing in a 55:5-hour sensed period.European CommissionMinisterio de Economía y CompetitividadComunidad de Madri

e_Buah - Biblioteca Digital de la Universidad de Alcalá

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Speech to sign language translation system for Spanish

Author: Abdel-Fattah
Casacuberta
Christopoulos
Cole
Engberg-Pedersen
F. Fernández
Granström
Gustafson
J. Ferreiros
J. Macías-Guarasa
J.M. Lucas
J.M. Montero
J.M. Pardo
Koehn
L.F. D’Haro
Masataka
Och
Prillwitz
Pyers
R. Barra
R. Córdoba
R. San-Segundo
Reyes
Sylvie
Zens
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Word Pair Speech

Author: A. Ruiz
J. Colás
J. Ferreiros
J. M. Pardo
J. Macías-guarasa
Publication venue
Publication date
Field of study

In this paper we present a speech understanding system that accepts continuous speech sentences as input to command a HIFI set. The string of words obtained from the recogniser is sent to the understanding system that tries to fill in a set of frames specifying the triplet (SUBSYSTEM, PARAMETER, VALUE). The understanding module follows the philosophy presented in [1]. The triplets are finally translated into infrared commands by an actuator module to be sent to the HIFI set, composed by a radio, a three deck CD player and a two tape cassette recorder/player. All circumstances (understanding incompleteness, HIFI set status, result of the command execution) are confirmed back to the user via a text to speech system with substitutable-concept pattern-based generated messages. We have introduced a response module because some of the final users will be blind people, and because we are studying the possibility of establishing restricted dialogues with the users in order to complete or correct the commands. The understanding engine is based on semantic-like tagging

CiteSeerX

EFFICIENT NN-BASED SEARCH SPACE REDUCTION IN A LARGE VOCABULARY SPEECH RECOGNITION SYSTEM

Author: J. Ferreiros
J. M. Montero
J. Macías-guarasa
R. Córdoba
Á. Olbés
Publication venue
Publication date
Field of study

In very large vocabulary speech recognition systems using the hypothesis-verification paradigm, the verification stage is usually the most time consuming. State of the art systems combine fixed size hypothesized search spaces with advanced pruning techniques. In this paper we propose a novel strategy to dynamically calculate the hypothesized search space, using neural networks as the estimation module and designing the input feature set with a careful greedy-based selection approach. The main achievement has been a statistically significant relative decrease in error rate of 33.53%, while getting a relative decrease in average computational demands of up to 19.40%

CiteSeerX

Improved Variable Preselection List Length Estimation Using NNs

Author: A. Gallardo-antolín
J. Colás
J. Ferreiros
J. M. Pardo
J. Macías-guarasa
Publication venue
Publication date
Field of study

In very large vocabulary hypothesis-verification systems, the fine acoustic matcher is usually the most time consuming, so that the main concern is reducing the preselection list length as much as possible. Traditionally, these systems use a too high fixed preselection list length, increasing computational demands over the really needed. The idea we are proposing is estimating a different preselection list length for every utterance, so that we can lower the average computational effort needed for the recognition process. As we will show, it’s even possible that the resulting system outperforms the fixed length one in error rate, even when reducing computational cost. This paper presents a detailed study on a NN based approach to variable preselection list length estimation. The main achievement has been a relative decrease in error rate of up to 40%, while getting a relative decrease in average preselection list length of up to 31%. 1

CiteSeerX