127 research outputs found
Fundamentos de teoría de la comunicación
135 p.Este libro es una guía para las clases magistrales del curso de Teoría de la Comunicación impartido en la titulación de Grado de Tecnología de Ingeniería de Telecomunicación, en la Escuela de Ingeniería de Bilbao (Universidad del País Vasco/Euskal Herriko Unibertsitatea). El contenido, por tanto, es el diseñado por el profesorado responsable de las clases magistrales, desde la implantación del actual plan de estudios.
La asignatura de Teoría de la Comunicación describe, desde un punto de vista formal y matemático, los mecanismos básicos que permiten realizar la transmisión de la información en los sistemas de telecomunicación modernos (radio y televisión digital, transmisión de datos, comunicaciones telefónicas etc.)
Komunikazioaren teoriaren oinarriak
Liburu hau Bilboko Ingeniaritza Eskolan (Euskal Herriko Unibertsitatea UPV/EHU) Telekomunikazio Teknologiaren Ingeniaritzako Gradu titulazioan irakasten den Komunikazioaren Teoria irakasgaiaren klase magistraletan gidaliburua da. Edukiak, beraz, irakaskuntza plan berria ezarri zenetik, klase magistralak irakasten dituzten irakasleek diseinatutako edukiarekin bat datoz. Komunikazioaren Teoria irakasgaiak telekomunikazioen oinarrizko kontzeptuak lantzen ditu. Horrela, ikuspuntu formal eta matematikoa abiapuntu, telekomunikazio sistema modernoetan informazioa transmititzen dituzten oinarrizko mekanismoak deskribatzen ditu (irrati eta telebista digitala, datu-transmisioa, telefono bidezko komunikazioak, eta abar)
Komunikazioaren teoriaren oinarriak
Liburu hau Bilboko Ingeniaritza Eskolan (Euskal Herriko Unibertsitatea UPV/EHU) Telekomunikazio Teknologiaren Ingeniaritzako Gradu titulazioan irakasten den Komunikazioaren Teoria irakasgaiaren klase magistraletan gidaliburua da. Edukiak, beraz, irakaskuntza plan berria ezarri zenetik, klase magistralak irakasten dituzten irakasleek diseinatutako edukiarekin bat datoz. Komunikazioaren Teoria irakasgaiak telekomunikazioen oinarrizko kontzeptuak lantzen ditu. Horrela, ikuspuntu formal eta matematikoa abiapuntu, telekomunikazio sistema modernoetan informazioa transmititzen dituzten oinarrizko mekanismoak deskribatzen ditu (irrati eta telebista digitala, datu-transmisioa, telefono bidezko komunikazioak, eta abar)
Evaluation of Tacotron Based Synthesizers for Spanish and Basque
In this paper, we describe the implementation and evaluation of Text to Speech synthesizers based on neural networks for Spanish and Basque. Several voices were built, all of them using a limited number of data. The system applies Tacotron 2 to compute mel-spectrograms from the input sequence, followed by WaveGlow as neural vocoder to obtain the audio signals from the spectrograms. The limited number of data used for training the models leads to synthesis errors in some sentences. To automatically detect those errors, we developed a new method that is able to find the sentences that have lost the alignment during the inference process. To mitigate the problem, we implemented a guided attention providing the system with the explicit duration of the phonemes. The resulting system was evaluated to assess its robustness, quality and naturalness both with objective and subjective measures. The results reveal the capacity of the system to produce good quality and natural audios.This work was funded by the Basque Government (Project refs. PIBA 2018-035, IT-1355-19). This work is part of the project Grant PID 2019-108040RB-C21 funded by MCIN/AEI/10.13039/ 501100011033
An on-line VAD based on Multi-Normalisation Scoring (MNS) of observation likelihoods
Preprint del artículo públicado online el 31 de mayo 2018Voice activity detection (VAD) is an essential task in expert systems that rely on oral interfaces. The VAD module detects the presence of human speech and separates speech segments from silences and non-speech noises. The most popular current on-line VAD systems are based on adaptive parameters which seek to cope with varying channel and noise conditions. The main disadvantages of this approach are the need for some initialisation time to properly adjust the parameters to the incoming signal and uncertain performance in the case of poor estimation of the initial parameters. In this paper we propose a novel on-line VAD based only on previous training which does not introduce any delay. The technique is based on a strategy that we have called Multi-Normalisation Scoring (MNS). It consists of obtaining a vector of multiple observation likelihood scores from normalised mel-cepstral coefficients previously computed from different databases. A classifier is then used to label the incoming observation likelihood vector. Encouraging results have been obtained with a Multi-Layer Perceptron (MLP). This technique can generalise for unseen noise levels and types. A validation experiment with two current standard ITU-T VAD algorithms demonstrates the good performance of the method. Indeed, lower classification error rates are obtained for non-speech frames, while results for speech frames are similar.This work was partially supported by the EU (ERDF) under grant TEC2015-67163-C2-1-R (RESTORE) (MINECO/ERDF, EU) and by the Basque Government under grant KK-2017/00043 (BerbaOla)
Modelo de duración para conversión de texto a voz en euskera
En este artículo se presenta el trabajo realizado en el modelado de la duración
de los fonemas en euskera estándar, para ser utilizado en conversión de texto a voz. El
modelado estadístico se ha llevado a cabo mediante árboles binarios de regresión
utilizando un corpus de 57.300 fonemas. Se han realizado varios experimentos de
predicción testeando diferentes factores de influencia. El resultado obtenido en la
predicción de la duración tiene un RMSE de 22.23 ms.This paper presents the modelling of phone durations in standard Basque, to be
included in a text-to-speech system. The statistical modelling has been done using binary
regression trees and a large corpus containing 57.300 phones. Several experiments have
been performed, testing different sets of predicting factors. The result when predicting
durations with this model has a RMSE of 22.23 ms.Este trabajo ha sido parcialmente financiado por
el Ministerio de Ciencia y Tecnología
(TIC2000-1005-C03-03 y TIC2000-1669-C04-03)
Enrichment of Oesophageal Speech: Voice Conversion with Duration-Matched Synthetic Speech as Target
Pathological speech such as Oesophageal Speech (OS) is difficult to understand due to the presence of undesired artefacts and lack of normal healthy speech characteristics. Modern speech technologies and machine learning enable us to transform pathological speech to improve intelligibility and quality. We have used a neural network based voice conversion method with the aim of improving the intelligibility and reducing the listening effort (LE) of four OS speakers of varying speaking proficiency. The novelty of this method is the use of synthetic speech matched in duration with the source OS as the target, instead of parallel aligned healthy speech. We evaluated the converted samples from this system using a collection of Automatic Speech Recognition systems (ASR), an objective intelligibility metric (STOI) and a subjective test. ASR evaluation shows that the proposed system had significantly better word recognition accuracy compared to unprocessed OS, and baseline systems which used aligned healthy speech as the target. There was an improvement of at least 15% on STOI scores indicating a higher intelligibility for the proposed system compared to unprocessed OS, and a higher target similarity in the proposed system compared to baseline systems. The subjective test reveals a significant preference for the proposed system compared to unprocessed OS for all OS speakers, except one who was the least proficient OS speaker in the data set.This project was supported by funding from the European Union’s H2020 research and innovation programme under the MSCA GA 675324 (the ENRICH network: www.enrich-etn.eu (accessed on 25 June 2021)), and the Basque Government (PIBA_2018_1_0035 and IT355-19)
Intelligibility and Listening Effort of Spanish Oesophageal Speech
Communication is a huge challenge for oesophageal speakers, be it for interactions with fellow humans or with digital voice assistants. We aim to quantify these communication challenges (both human-human and human-machine interactions) by measuring intelligibility and Listening Effort (LE) of Oesophageal Speech (OS) in comparison to Healthy Laryngeal Speech (HS). We conducted two listening tests (one web-based, the other in laboratory settings) to collect these measurements. Participants performed a sentence recognition and LE rating task in each test. Intelligibility, calculated as Word Error Rate, showed significant correlation with self-reported LE ratings. Speaker type (healthy or oesophageal) had a major effect on intelligibility and effort. More LE was reported for OS compared to HS even when OS intelligibility was close to HS. Listeners familiar with OS reported less effort when listening to OS compared to nonfamiliar listeners. However, such advantage of familiarity was not observed for intelligibility. Automatic speech recognition scores were higher for OS compared to HS.This project was supported by funding from the EUs H2020 research and innovation programme under the MSCA GA 67532*4 (the ENRICH network: www.enrich-etn.eu), the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project, TEC2015-67163-C2-1-R) and the Basque Government (DL4NLP KK-2019/00045, PIBA_2018_1_0035 and IT355-19)
Intelligibility and Listening Effort of Spanish Oesophageal Speech
Communication is a huge challenge for oesophageal speakers, be it for interactions with fellow humans or with digital voice assistants. We aim to quantify these communication challenges (both human-human and human-machine interactions) by measuring intelligibility and Listening Effort (LE) of Oesophageal Speech (OS) in comparison to Healthy Laryngeal Speech (HS). We conducted two listening tests (one web-based, the other in laboratory settings) to collect these measurements. Participants performed a sentence recognition and LE rating task in each test. Intelligibility, calculated as Word Error Rate, showed significant correlation with self-reported LE ratings. Speaker type (healthy or oesophageal) had a major effect on intelligibility and effort. More LE was reported for OS compared to HS even when OS intelligibility was close to HS. Listeners familiar with OS reported less effort when listening to OS compared to nonfamiliar listeners. However, such advantage of familiarity was not observed for intelligibility. Automatic speech recognition scores were higher for OS compared to HS.This project was supported by funding from the EUs H2020 research and innovation programme under the MSCA GA 67532*4 (the ENRICH network: www.enrich-etn.eu), the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project, TEC2015-67163-C2-1-R) and the Basque Government (DL4NLP KK-2019/00045, PIBA_2018_1_0035 and IT355-19)
RESTORE Project: REpair, STOrage and REhabilitation of speech
RESTORE is a project aimed to improve the quality of commu-nication for people with difficulties producing speech, provid-ing them with tools and alternative communication services. Atthe same time, progress will be made at the research of tech-niques for restoration and rehabilitation of disordered speech.The ultimate goal of the project is to offer new possibilities inthe rehabilitation and reintegration into society of patients withspeech pathologies, especially those laryngectomised, by de-signing new intervention strategies aimed to favour their com-munication with the environment and ultimately increase theirquality of life.This project has been founded by the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTOREproject, TEC2015-67163-C2-1-R and TEC2015-67163-C2-2-R
- …