36 research outputs found

    Generación de una voz sintética en Castellano basada en HSMM para la Evaluación Albayzín 2008: conversión texto a voz

    Get PDF
    Este artículo describe el proceso de generación de una voz en castellano utilizando el corpus UPC ESMA de UPC proporcionado por la Evaluación Albayzín 2008: Conversión Texto a Voz. Se ha implementado una voz basada en selección de unidades mediante el paquete Multisyn de Festival y otra basada en Hidden Semi-Markov Models (HSMM) mediante HTS. Tras una breve evaluación de la calidad de ambas voces, se detallan las características principales de la voz basada en HSMM, sistema final presentado a la evaluación

    Proposing a speech to gesture translation architecture for Spanish deaf people.

    Get PDF
    This article describes an architecture for translating speech into Spanish Sign Language (SSL). The architecture proposed is made up of four modules: speech recognizer, semantic analysis, gesture sequence generation and gesture playing. For the speech recognizer and the semantic analysis modules, we use software developed by IBM and CSLR (Center for Spoken Language Research at University of Colorado), respectively. Gesture sequence generation and gesture animation are the modules on which we have focused our main effort. Gesture sequence generation uses semantic concepts (obtained from the semantic analysis) associating them with several SSL gestures. This association is carried out based on a number of generation rules. For gesture animation, we have developed an animated agent (virtual representation of a human person) and a strategy for reducing the effort in gesture animation. This strategy consists of making the system automatically generate all agent positions necessary for the gesture animation. In this process, the system uses a few main agent positions (two or three per second) and some interpolation strategies, both issues previously generated by the service developer (the person who adapts the architecture proposed in this paper to a specific domain). Related to this module, we propose a distance between agent positions and a measure of gesture complexity. This measure can be used to analyze the gesture perception versus its complexity. With the architecture proposed, we are not trying to build a domain independent translator but a system able to translate speech utterances into gesture sequences in a restricted domain: railway, flights or weather information

    Real Field Deployment of a Smart Fiber Optic Surveillance System for Pipeline Integrity Threat Detection: Architectural Issues and Blind Field Test Results

    Get PDF
    This paper presents an on-line augmented surveillance system that aims to real time monitoring of activities along a pipeline. The system is deployed in a fully realistic scenario and exposed to real activities carried out in unknown places at unknown times within a given test time interval (socalled blind field tests). We describe the system architecture that includes specific modules to deal with the fact that continuous on-line monitoring needs to be carried out, while addressing the need of limiting the false alarms at reasonable rates. To the best or our knowledge, this is the first published work in which a pipeline integrity threat detection system is deployed in a realistic scenario (using a fiber optic along an active gas pipeline) and is thoroughly and objectively evaluated in realistic blind conditions. The system integrates two operation modes: The machine+activity identification mode identifies the machine that is carrying out a certain activity along the pipeline, and the threat detection mode directly identifies if the activity along the pipeline is a threat or not. The blind field tests are carried out in two different pipeline sections: The first section corresponds to the case where the sensor is close to the sensed area, while the second one places the sensed area about 35 km far from the sensor. Results of the machine+activity identification mode showed an average machine+activity classification rate of 46:6%. For the threat detection mode, 8 out of 10 threats were correctly detected, with only 1 false alarm appearing in a 55:5-hour sensed period.European CommissionMinisterio de Economía y CompetitividadComunidad de Madri

    Word Pair Speech

    No full text
    In this paper we present a speech understanding system that accepts continuous speech sentences as input to command a HIFI set. The string of words obtained from the recogniser is sent to the understanding system that tries to fill in a set of frames specifying the triplet (SUBSYSTEM, PARAMETER, VALUE). The understanding module follows the philosophy presented in [1]. The triplets are finally translated into infrared commands by an actuator module to be sent to the HIFI set, composed by a radio, a three deck CD player and a two tape cassette recorder/player. All circumstances (understanding incompleteness, HIFI set status, result of the command execution) are confirmed back to the user via a text to speech system with substitutable-concept pattern-based generated messages. We have introduced a response module because some of the final users will be blind people, and because we are studying the possibility of establishing restricted dialogues with the users in order to complete or correct the commands. The understanding engine is based on semantic-like tagging

    EFFICIENT NN-BASED SEARCH SPACE REDUCTION IN A LARGE VOCABULARY SPEECH RECOGNITION SYSTEM

    No full text
    In very large vocabulary speech recognition systems using the hypothesis-verification paradigm, the verification stage is usually the most time consuming. State of the art systems combine fixed size hypothesized search spaces with advanced pruning techniques. In this paper we propose a novel strategy to dynamically calculate the hypothesized search space, using neural networks as the estimation module and designing the input feature set with a careful greedy-based selection approach. The main achievement has been a statistically significant relative decrease in error rate of 33.53%, while getting a relative decrease in average computational demands of up to 19.40%

    Improved Variable Preselection List Length Estimation Using NNs

    No full text
    In very large vocabulary hypothesis-verification systems, the fine acoustic matcher is usually the most time consuming, so that the main concern is reducing the preselection list length as much as possible. Traditionally, these systems use a too high fixed preselection list length, increasing computational demands over the really needed. The idea we are proposing is estimating a different preselection list length for every utterance, so that we can lower the average computational effort needed for the recognition process. As we will show, it’s even possible that the resulting system outperforms the fixed length one in error rate, even when reducing computational cost. This paper presents a detailed study on a NN based approach to variable preselection list length estimation. The main achievement has been a relative decrease in error rate of up to 40%, while getting a relative decrease in average preselection list length of up to 31%. 1
    corecore