Search CORE

16 research outputs found

Use of the harmonic phase in synthetic speech detection

Author: Sánchez de la Fuente Jon
Publication venue: INESC-ID
Publication date: 23/11/2016
Field of study

Special Session paper: recent PhD thesis descriptionThis PhD dissertation was written by Jon Sanchez and supervised by Inma Hernáez and Ibon Saratxaga. It was defended at the University of the Basque Country the 5th of February 2016. The committee members were Dr. Alfonso Ortega Giménez (UniZar), Dr. Daniel Erro Eslava (UPV/EHU) and Dr. Enric Monte Moreno (UPC). The dissertation was awarded a "sobresaliente cum laude” qualification.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project,TEC2015-67163-C2-1-R) and the Basque Government (ELKAROLA project, KK-2015/00098)

Archivo Digital para la Docencia y la Investigación

Use of the harmonic phase in synthetic speech detection

Author: Sánchez de la Fuente Jon
Publication venue: INESC-ID
Publication date: 23/11/2016
Field of study

Archivo Digital para la Docencia y la Investigación

Utilización de la fase armónica en la detección de voz sintética.

Author: Sánchez de la Fuente Jon
Publication venue
Publication date: 01/01/2016
Field of study

156 p.Los sistemas de verificación de locutor (SV) tienen que enfrentarse a la posibilidad de ser atacados mediante técnicas de spoofing. Hoy en día, las tecnologías de conversión de voces y de síntesis de voz adaptada a locutor han avanzado lo suficiente para poder crear voces que sean capaces de engañar a un sistema SV. En esta tesis se propone un módulo de detección de habla sintética (SSD) que puede utilizarse como complemento a un sistema SV, pero que es capaz de funcionar de manera independiente. Lo conforma un clasificador basado en GMM, dotado de modelos de habla humana y sintética. Cada entrada se compara con ambos, y, si la diferencia de verosimilitudes supera un determinado umbral, se acepta como humana, rechazándose en caso contrario. El sistema desarrollado es independiente de locutor. Para la generación de modelos se utilizarán parámetros RPS. Se propone una técnica para reducir la complejidad del proceso de entrenamiento, evitando generar TTSs adaptados o un conversor de voz para cada locutor. Para ello, como la mayoría de los sistemas de adaptación o síntesis modernos hacen uso de vocoders, se propone transcodificar las señales humanas mediante vocoders para obtener de esta forma sus versiones sintéticas, con las que se generarán los modelos sintéticos del clasificador. Se demostrará que se pueden detectar señales sintéticas detectando que se crearon mediante un vocoder. El rendimiento del sistema prueba en diferentes condiciones: con las propias señales transcodificadas o con ataques TTS. Por último, se plantean estrategias para el entrenamiento de modelos para sistemas SSD

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital para la Docencia y la Investigación

Modelo de duración para conversión de texto a voz en euskera

Author: Hernáez Rioja Inmaculada
Navas Cordón Eva
Sánchez de la Fuente Jon
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2002
Field of study

En este artículo se presenta el trabajo realizado en el modelado de la duración de los fonemas en euskera estándar, para ser utilizado en conversión de texto a voz. El modelado estadístico se ha llevado a cabo mediante árboles binarios de regresión utilizando un corpus de 57.300 fonemas. Se han realizado varios experimentos de predicción testeando diferentes factores de influencia. El resultado obtenido en la predicción de la duración tiene un RMSE de 22.23 ms.This paper presents the modelling of phone durations in standard Basque, to be included in a text-to-speech system. The statistical modelling has been done using binary regression trees and a large corpus containing 57.300 phones. Several experiments have been performed, testing different sets of predicting factors. The result when predicting durations with this model has a RMSE of 22.23 ms.Este trabajo ha sido parcialmente financiado por el Ministerio de Ciencia y Tecnología (TIC2000-1005-C03-03 y TIC2000-1669-C04-03)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Secretaría de Estado de Cultura

Automatic Classification of Synthetic Voices for Voice Banking Using Objective Measures

Author: Alonso Agustin
García Romillo Víctor
Hernáez Rioja Inmaculada
Navas Cordón Eva [
Sánchez de la Fuente Jon
Publication venue: 'MDPI AG'
Publication date: 01/02/2022
Field of study

Speech is the most common way of communication among humans. People who cannot communicate through speech due to partial of total loss of the voice can benefit from Alternative and Augmentative Communication devices and Text to Speech technology. One problem of using these technologies is that the included synthetic voices might be impersonal and badly adapted to the user in terms of age, accent or even gender. In this context, the use of synthetic voices from voice banking systems is an attractive alternative. New voices can be obtained applying adaptation techniques using recordings from people with healthy voice (donors) or from the user himself/herself before losing his/her own voice. In this way, the goal is to offer a wide voice catalog to potential users. However, as there is no control over the recording or the adaptation processes, some method to control the final quality of the voice is needed. We present the work developed to automatically select the best synthetic voices using a set of objective measures and a subjective Mean Opinion Score evaluation. A prediction algorithm of the MOS has been build which correlates similarly to the most correlated individual measure.This work has been funded by the Basque Government under the project ref. PIBA 2018-035 and IT-1355-19. This work is part of the project Grant PID 2019-108040RB-C21 funded by MCIN/AEI/10.13039/501100011033

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Archivo Digital para la Docencia y la Investigación

Synthetic speech detection using phase information

Author: Hernáez Rioja Inmaculada Concepción
Navas Cordón Eva
Saratxaga Couceiro Ibon
Sánchez de la Fuente Jon
Wu Zhizheng
Publication venue: 'Elsevier BV'
Publication date: 16/04/2016
Field of study

Taking advantage of the fact that most of the speech processing techniques neglect the phase information, we seek to detect phase perturbations in order to prevent synthetic impostors attacking Speaker Verification systems. Two Synthetic Speech Detection (SSD) systems that use spectral phase related information are reviewed and evaluated in this work: one based on the Modified Group Delay (MGD), and the other based on the Relative Phase Shift, (RPS). A classical module-based MFCC system is also used as baseline. Different training strategies are proposed and evaluated using both real spoofing samples and copy-synthesized signals from the natural ones, aiming to alleviate the issue of getting real data to train the systems. The recently published ASVSpoof2015 database is used for training and evaluation. Performance with completely unrelated data is also checked using synthetic speech from the Blizzard Challenge as evaluation material. The results prove that phase information can be successfully used for the SSD task even with unknown attacks.This work has been partially supported by the Basque Government (ElkarOla Project, KK-2015/00,098) and the Spanish Ministry of Economy and Competitiveness (Restore project, TEC2015-67,163-C2-1-R)

Archivo Digital para la Docencia y la Investigación

The observation likelihood of silence: analysis and prospects for VAD applications

Author: Hernaez Rioja Inmaculada Concepción
Navas Cordón Eva
Odriozola Sustaeta Igor
Serrano García Luis
Sánchez de la Fuente Jon
Publication venue: 'International Speech Communication Association'
Publication date: 23/11/2018
Field of study

This paper shows a research on the behaviour of the observa-tion likelihoods generated by the central state of asilenceHMM(Hidden Markov Model) trained for Automatic Speech Recog-nition (ASR) using cepstral mean and variance normalization(CMVN). We have seen that observation likelihood shows astable behaviour under different recording conditions, and thischaracteristic can be used to discriminate betweenspeechandsilenceframes. We present several experiments which provethat the mere use of a decision threshold produces robust re-sults for very different recording channels and noise conditions.The results have also been compared with those obtained by twostandard VAD systems, showing promising prospects. All in all,observation likelihood scores could be useful as the basis for thedevelopment of future VAD systems, with further research andanalysis to refine the results.This work has been partially supported by the EU(FEDER) under grant TEC2015-67163-C2-1-R (RESTORE)(MINECO/FEDER, UE) and by the Basque Government undergrant KK-2017/00043 (BerbaOla

Archivo Digital para la Docencia y la Investigación

Ahots sintetiko pertsonalizatuak: esperientzia baten deskribapena

Author: Hernáez Rioja Inmaculada
Navas Cordón Eva
Saratxaga Couceiro Ibon
Sánchez de la Fuente Jon
Publication venue: 'UPV/EHU Press'
Publication date: 01/01/2021
Field of study

Ahotsa ezinbestekoa da giza komunikaziorako, eta haren galerak eragin handia du pertsonak gizartean integratzeko prozesuan. Testu-ahots bihurketak ahots sintetikoa eman diezaieke ahozko desgaitasuna duten pertsonei. Irtenbide arruntenek ahots estandarra izaten dute normalean, eta, horregatik, erabiltzaile batzuek zailtasunak dituzte beren burua ahots horrekin identifikatzeko. Horregatik, ahots sintetiko pertsonalizatuak sortu behar dira, eta ahozko desgaitasuna duten pertsonei ahots-katalogo bat eskaini behar zaie, beren beharretara egokitzen den ahots bat aukeratu ahal izan dezaten. ZureTTS proiektuaren helburua ahots pertsonalizatu horiek ematea da, bai gaztelaniaz, bai euskaraz. Ahotsa galduko duten pertsonek edo ahotsik ez dutenei ahotsa eman nahi dieten pertsona altruistek 100 esaldi grabatzen dituzte, AhoMyTTS web-atariaren bidez. Esaldi horiekin, egokitze-prozesu bat egiten da, grabaketako ahotsaren antzeko ahots sintetiko bat sortzeko. Erabiltzaileari sintesi-motor bat ematen zaio ahots pertsonalizatu horrekin batera, ahozko mezuak sortzea eskaintzen duten aplikazioetan erabiltzeko. Gainera, ahots-katalogo bat ere badago, grabaketarik egin ezin duen pertsona batek ahots horien artean gustukoena aukeratu dezan. 1.200 pertsonak baino gehiagok erabili dute sistema hori ahots pertsonalizatu bat lortzeko, eta haietatik 58 hautatu ditugu katalogoan sartzeko. Erabiltzaileei egindako inkestek erakusten dute gustura daudela ahots sintetikoaren hainbat alderdirekin: gehienen ustez, ahots sintetikoa jatorrizkoaren antzekoa da, atsegina eta argia, baina robotiko samarra. Lan honek garapen jasangarrirako 10. helburuari laguntzen dio, herrialde bakoitzaren barneko eta herrialdeen arteko desberdintasunak murriztuz. Era berean, garapen jasangarrirako 4. helburuari ere laguntzen dio, guztiontzako kalitatezko hezkuntza inklusiboa nahiz bidezkoa bermatzea errazten duten tresnak eskainiz.; The voice is so essential for human communication that its loss drastically affects the integration of people in society. Text-to-speech can provide a synthetic voice for people with oral disabilities. The most common solutions usually provide a standard voice, and users have difficulties to identify themselves with it. For this reason, we need to create personalized synthetic voices and offer a catalogue of voices to people with oral disabilities so that they can choose one that suits their needs. The objective of the ZureTTS project is to provide these personalized voices, both in Spanish and in Basque. Through the AhoMyTTS web portal, people who are going to lose their voice or altruistic people who want to provide voices to those who do not have it, record 100 carefully se-lected sentences. A synthetic voice with similar characteristics to the voice of the recording is generated by applying an adaptation process. The user is provided with a synthesis engine along with that personalized voice, so that they can use it in applications that require oral message generation. In addition, we offer a catalogue of voices to choose from if one is no longer able to record. More than 1,200 people have used the system to obtain a personalized voice and 58 of them have been selected to be included in the cata-logue. User surveys show user satisfaction with various aspects of the synthetic voice: most think that the synthetic voice is similar to the original, pleasant and clear, although a bit robotic. This work contributes mainly to goal 10 for sustainable development by re-ducing inequality within and among countries. It also contributes to goal 4 for sustainable development, providing tools that facilitate access for all to an inclusive, equitable and quality education

Archivo Digital para la Docencia y la Investigación

IMPACT-Global Hip Fracture Audit: Nosocomial infection, risk prediction and prognostication, minimum reporting standards and global collaborative audit. Lessons from an international multicentre study of 7,090 patients conducted in 14 nations during the COVID-19 pandemic

Author: Abdul-Jabar Hani
Abu-Rajab Rashid
Abugarja Ahmed
Adam Karen
Aguado Hernández Héctor J.
Améstica Lazcano Gedeón
Anderson Sarah
Ansar Mahmood
Antrobus Jonathan
Aragón Achig Esteban Javier
Archunan Maheswaran
Arrieta Salinas Mirentxu
Ashford-Wilson Sarah
Assens Gibert Cristina
Athanasopoulou Katerina
Awadelkarim Mohamed
Baird Stuart
Bajada Stefan
Balakrishnan Shobana
Balasubramanian Sathishkumar
Ballantyne James A.
Barkham Benjamin
Barmpagianni Christina
Barres-Carsi Mariano
Barrett Sarah
Baskaran Dinnish
Bell Jean
Bell Katrina
Bell Stuart
Bellelli Giuseppe
Benchimol Javier Alberto
Boietti Bruno Rafael
Boswell Sally
Braile Adriano
Brennan Caitlin
Brent Louise
Brooke Ben
Bruno Gaetano
Burahee Abdus
Burns Shirley
Bárcena Goitiandia Leopoldo
Calabrò Giampiero
Campbell Lucy
Carabelli Guido Sebastian
Carnegie Carol
Carretero Cristobal Guillermo
Caruana Ethan
Cassinello Ogea M. a Concepción
Castellanos Robles Juan
Castillon Pablo
Cecere Antonio Benedetto
Chakrabarti Anil
Chen Ping
Clarke Jon V.
Clement Nicholas D.
Collins Grace
Corrales Cardenal Jorge E.
Corsi Maurizio
Craxford Simon
Crooks Melissa
Cuarental-García Javier
Cuthbert Rory
Cózar Adelantado Gara María
Dall Graham
Daskalakis Ioannis
De Cicco Annalisa
Demaria Pablo
Dereix John
Diana de la Fuente de Dios
Dinamarca Montecinos José Luis
Do Le Ha Phuong
Donoso Coppa Juan Pablo
Drosos Georgios
Duckworth Andrew D.
Duffy Andrew
Díaz Jiménez Julian
East Jamie
Eastwood Deborah
Elbahari Hassan
Elias de Molins Peña Carmen
Elmamoun Mamoun
Emmerson Ben
Escobar Sánchez Daniel
Faimali Martina
Farrow Luke
Farré-Mercadé Maria Victòria
Fayez Almari
Fell Adam
Fenner Christopher
Ferguson David
Finlayson Louise
Flores Gómez Aldo
Freeman Nicholas
French Jonathan
Gabardo Calvo Santiago
Gagliardo Nicola
Garcia Albiñana Joan
García Cruz Guillermo
García de Cortázar Antolín Unai
García Virto Virginia
Gealy Sophie
Gil Caballero Sandra Marcela
Gill Moneet
González González María Soledad
Gopireddy Rajesh
Guntley Diane
Gurung Binay
Guzmán Rosales Guadalupe
Haddad Nedaa
Hafeez Mahum
Hall Andrew J.
Haller Petra
Halligan Emer
Hamid Hytham K. S.
Hardie John
Hawker Imogen
Helal Amr
Herrera Cruz Mariana
Herreros Ruiz-Valdepeñas Ruben
Horton James
Howells Sean
Howieson Alan
Hughes Luke
Hurtado Ortega Ana
Huxley Peter
Hünicken Torrez Flavia Lorena
Ilahi Nida
Iliadis Alexis
Inman Dominic
Jadhao Piyush
Jandoo Rajan
Jawad Lucy
Jayatilaka Malwattage Lara Tania
Jenkins Paul J.
Jeyapalan Rathan
Johansen Antony
Johnson David
Johnston Andrew
Joseph Sarah
Kapoor Siddhant
Karagiannidis Georgios
Karanam Krishna Saga
Kattakayam Freddy
Konarski Alastair
Kontakis Georgios
Labrador Hernández Gregorio
Lancaster Victoria
Landi Giovanni
Le Brian
Liew Ignatius
Logishetty Kartik
Lopez Marquez Andrew Carlomaria Daniel
Lopez Judit
Lum Joann
MacLullich Alasdair MJ.
Macpherson Gavin J.
Madan Suvira
Mahroof Sabreena
Malik-Tabassum Khalid
Mallina Ravi
Maqsood Afnan
Marson Ben
Martin Legorburo M José
Martin-Perez Encarna
Martinez Martin Javier
Martínez Jiménez Tania
Mayne Alistair
Mayor Amy
McAlinden Gavan
McDonald Lorna
McIntyre Joshua
McKay Pamela
McKean Greg
McLean Lucille
McShane Heather
Medici Antonio
Meeke Chelsea
Meldrum Evonne
Mendez Mijail
Mercer Scott
Merino Perez Josu
Mesa-Lampré María-Pilar
Mighton Shuna
Milne Kirsty
Mohamed Yaseen Muhammed
Moppett Iain
Mora Jesus
Morales-Zumel Sira
Moreno Fenoll Irene Blanca
Mousa Adham
Murray Alastair W.
Murray Elspeth V.
Nair Radhika
Neary Fiona
Negri Giacomo
Negus Oliver
Newham-Harvey Fiona
Ng Nigel
Nightingale Jess
Noor Mohamed Anver Sumiya
Nunag Perrico
O'Hare Matthew
Ojeda-Thies Cristina
Ollivere Ben
Ortés Gómez Raquel
Owens AnneMarie
Page Siobhan
Palloni Valentina
Panagiotopoulos Andreas
Panagiotopoulos Elias
Panesar Paul
Papadopoulos Antonios
Pareja Sierra Teresa
Park Chang
Parwaiz Hammad
Paterson-Byrne Paul
Patton Sam
Pearce Jack
Pellegrino Achille
Pezzella Raffaele
Phadnis Ashish
Pinder Charlotte
Piper Danielle
Porter Marina
Powell-Bowns Matilda
Prieto Martín Rocío
Probert Annabel
Pèrez Cuellar Arturo
Ramesh Ashwanth
Ramírez de Arellano Manuel Vicente Mejía
Renton Duncan
Rickman Stephen
Robertson Alastair
Roche Albero Adrian
Rodrigo Verguizas José Alberto
Rodríguez Couso Myriam
Rooney Joanna
Saldaña-Díaz Andres
Santulli Adriano
Sanz Pérez Marta Isabel
Sarraf Khaled M.
Scarsbrook Christine
Scott Chloe E. H.
Scott Jennifer
Shah Sachi
Sharaf Sharief
Sharma Sidharth
Shirley Denise
Siano Antonio
Simpson James
Singh Abhinav
Singh Amit
Sinnett Tim
Sisodia Gurudatt
Smith Philomena
Sophena Bert Eugenia
Spyridon Papagiannis
Steel Michael
Stewart Avril
Stewart Claire
Sugand Kapil
Sullivan Niall
Sweeting Lauren
Symes Michael
Sáez-López Pilar
Tan Dylan Jun Hao
Tancredi Francesco
Tatani Irini
Thomas Philip
Thomson Fraser
Toner Niamh S.
Tong Anna
Toro Antonio
Toro Giuseppe
Tosounidis Theodoros
Tottas Stylianos
Trinidad Leo Andrea
Tucker Damien
Vemulapalli Krishna
Ventura Garces Diego
Vernon Olivia Katherine
Viveros Garcia Juan Carlos
Ward Alex
Ward Kirsty
Watson Kate
Weerasuriya Thisara
White Tim O.
Wickramanayake Udara
Wilkinson Hannah
Windley Joseph
Wood Janet
Wynell-Mayow William
Zatti Giovanni
Zeiton Moez
Zurrón Lobato Miriam
Publication venue: 'Elsevier BV'
Publication date: 01/01/2022
Field of study

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Reconocimiento automático de emociones utilizando parámetros prosódicos

Author: Hernáez Rioja Inmaculada
Luengo Gil Iker
Navas Cordón Eva
Sánchez de la Fuente Jon
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2005
Field of study

Este artículo presenta los experimentos realizados para identificar automáticamente la emoción en una base de datos de habla emocional en euskara. Se han construido tres clasificadores diferentes: uno utilizando características espectrales y GMM, otro con parámetros prosódicos y SVM y el último con características prosódicas y SVM. Se extrajeron 86 características prosódicas y posteriormente se aplicó un algoritmo para seleccionar los parámetros más relevantes. El mejor resultado se obtuvo con el primero de los clasificadores, que alcanzó una precisión del 98.4% cuando se utilizan 512 componentes gaussianas. El clasificador construido con los 6 parámetros prosódicos más relevantes alcanza una precisión del 92.3% a pesar de su simplicidad, demostrando que la información prosódica es de gran importancia para identificar emociones.This paper presents the experiments made to automatically identify emotion in an emotional speech database for Basque. Three different classifiers have been built: one using spectral features and GMM, other with prosodic features and SVM and the last one with prosodic features and GMM. 86 prosodic features were calculated and then an algorithm to select the most relevant ones was applied. The first classifier gives the best result with a 98.4% accuracy when using 512 mixtures, but the classifier built with the best 6 prosodic features achieves an accuracy of 92.3% in spite of its simplicity, showing that prosodic information is very useful to identify emotions.Este trabajo ha sido parcialmente financiado por el Ministerio de Ciencia y Tecnología (TIC2003-08382-C05-03) y la Universidad del País Vasco (UPV-0147.345-E-14895/2002)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Secretaría de Estado de Cultura