Search CORE

5 research outputs found

Contribuciones al reconocimiento robusto de habla en redes de comunicaciones mediante transparametrización

Author: Gómez Cajas Diego Ferney
Publication venue
Publication date: 01/01/2011
Field of study

La creciente influencia de las redes de comunicaciones en todos los ámbitos de la vida moderna hace que cada vez sean más los servicios que se ofrecen a través de ellas, y dado que la comunicación oral es la forma más natural de comunicación humana, las tecnologías del habla juegan un rol importante en nuestra sociedad. Por este motivo, en esta tesis planteamos una serie de contribuciones al reconocimiento de habla en entornos de redes de comunicaciones, utilizando la técnica reconocimiento mediante transparametrización (RMT) sobre los dos tipos de redes que más cobertura tienen hoy en día: Internet y la telefonía celular. En particular, mejoramos la robustez ya demostrada de la técnica RMT frente a la distorsión por codificación y los errores de transmisión, y extendemos el análisis a casos con ruido de ambiente. En primer lugar, proponemos un procedimiento mejorado de estimación de la energía. En segundo lugar, aplicamos una técnica complementaria al RMT consistente en un filtrado del espectro de modulación, demostrando su eficacia en el entorno Internet. Además, y específicamente para el entorno UMTS proponemos una extensión de parámetros fundamentada en la protección que realiza el codificador de canal normativo y que consigue hacer un uso eficaz de los parámetros más protegidos por el codificador de canal, en beneficio de la robustez del sistema de reconocimiento. ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Nowadays, the modern communication networks play an outstanding role in our everyday life and the number of services offered through them is continuously increasing. As the interfaces to these services become more natural, they tend to embed speech technologies so that the human-to-machine communication mimics (to some extent) the human-to-human communication. In this context, this thesis tackles the problem of automatic speech recognition (ASR) in communication-centered environments. In particular, our contributions focus on the bitstream-based approach to ASR, which has already proved to be robust, in two of the most relevant communication scenarios: Internet and universal mobile telecommunication system (UMTS) networks. In this thesis we propose some techniques to improve the robustness of the ASR systems against the distortions resulting from the source coding and the transmission errors. For the voice over IP scenario, we propose an improved method for energy estimation and an additional technique based on filtering the modulation spectrum so that we are able to jointly deal with communication-related distortions and background noise. For the UMTS scenario, besides an improved energy estimation method, in this thesis we propose an extended feature vector that relies on the unequal error protection mechanism implemented in the channel codec. This extended feature vector makes an effective use of the most protected parameters in the bitstream to provide the ASR system with an enhanced robustness

Universidad Carlos III de Madrid e-Archivo

A configurable vector processor for accelerating speech coding algorithms

Author: Konstantia Koutsomyti (7201031)
Publication venue
Publication date: 01/01/2007
Field of study

The growing demand for voice-over-packer (VoIP) services and multimedia-rich applications has made increasingly important the efficient, real-time implementation of low-bit rates speech coders on embedded VLSI platforms. Such speech coders are designed to substantially reduce the bandwidth requirements thus enabling dense multichannel gateways in small form factor. This however comes at a high computational cost which mandates the use of very high performance embedded processors. This thesis investigates the potential acceleration of two major ITU-T speech coding algorithms, namely G.729A and G.723.1, through their efficient implementation on a configurable extensible vector embedded CPU architecture. New scalar and vector ISAs were introduced which resulted in up to 80% reduction in the dynamic instruction count of both workloads. These instructions were subsequently encapsulated into a parametric, hybrid SISD (scalar processor)–SIMD (vector) processor. This work presents the research and implementation of the vector datapath of this vector coprocessor which is tightly-coupled to a Sparc-V8 compliant CPU, the optimization and simulation methodologies employed and the use of Electronic System Level (ESL) techniques to rapidly design SIMD datapaths