16 research outputs found

    Use of the harmonic phase in synthetic speech detection

    Get PDF
    Special Session paper: recent PhD thesis descriptionThis PhD dissertation was written by Jon Sanchez and supervised by Inma Hernáez and Ibon Saratxaga. It was defended at the University of the Basque Country the 5th of February 2016. The committee members were Dr. Alfonso Ortega Giménez (UniZar), Dr. Daniel Erro Eslava (UPV/EHU) and Dr. Enric Monte Moreno (UPC). The dissertation was awarded a "sobresaliente cum laude” qualification.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project,TEC2015-67163-C2-1-R) and the Basque Government (ELKAROLA project, KK-2015/00098)

    Use of the harmonic phase in synthetic speech detection

    Get PDF
    Special Session paper: recent PhD thesis descriptionThis PhD dissertation was written by Jon Sanchez and supervised by Inma Hernáez and Ibon Saratxaga. It was defended at the University of the Basque Country the 5th of February 2016. The committee members were Dr. Alfonso Ortega Giménez (UniZar), Dr. Daniel Erro Eslava (UPV/EHU) and Dr. Enric Monte Moreno (UPC). The dissertation was awarded a "sobresaliente cum laude” qualification.This work has been partially funded by the Spanish Ministry of Economy and Competitiveness with FEDER support (RESTORE project,TEC2015-67163-C2-1-R) and the Basque Government (ELKAROLA project, KK-2015/00098)

    Utilización de la fase armónica en la detección de voz sintética.

    Get PDF
    156 p.Los sistemas de verificación de locutor (SV) tienen que enfrentarse a la posibilidad de ser atacados mediante técnicas de spoofing. Hoy en día, las tecnologías de conversión de voces y de síntesis de voz adaptada a locutor han avanzado lo suficiente para poder crear voces que sean capaces de engañar a un sistema SV. En esta tesis se propone un módulo de detección de habla sintética (SSD) que puede utilizarse como complemento a un sistema SV, pero que es capaz de funcionar de manera independiente. Lo conforma un clasificador basado en GMM, dotado de modelos de habla humana y sintética. Cada entrada se compara con ambos, y, si la diferencia de verosimilitudes supera un determinado umbral, se acepta como humana, rechazándose en caso contrario. El sistema desarrollado es independiente de locutor. Para la generación de modelos se utilizarán parámetros RPS. Se propone una técnica para reducir la complejidad del proceso de entrenamiento, evitando generar TTSs adaptados o un conversor de voz para cada locutor. Para ello, como la mayoría de los sistemas de adaptación o síntesis modernos hacen uso de vocoders, se propone transcodificar las señales humanas mediante vocoders para obtener de esta forma sus versiones sintéticas, con las que se generarán los modelos sintéticos del clasificador. Se demostrará que se pueden detectar señales sintéticas detectando que se crearon mediante un vocoder. El rendimiento del sistema prueba en diferentes condiciones: con las propias señales transcodificadas o con ataques TTS. Por último, se plantean estrategias para el entrenamiento de modelos para sistemas SSD

    Modelo de duración para conversión de texto a voz en euskera

    Get PDF
    En este artículo se presenta el trabajo realizado en el modelado de la duración de los fonemas en euskera estándar, para ser utilizado en conversión de texto a voz. El modelado estadístico se ha llevado a cabo mediante árboles binarios de regresión utilizando un corpus de 57.300 fonemas. Se han realizado varios experimentos de predicción testeando diferentes factores de influencia. El resultado obtenido en la predicción de la duración tiene un RMSE de 22.23 ms.This paper presents the modelling of phone durations in standard Basque, to be included in a text-to-speech system. The statistical modelling has been done using binary regression trees and a large corpus containing 57.300 phones. Several experiments have been performed, testing different sets of predicting factors. The result when predicting durations with this model has a RMSE of 22.23 ms.Este trabajo ha sido parcialmente financiado por el Ministerio de Ciencia y Tecnología (TIC2000-1005-C03-03 y TIC2000-1669-C04-03)

    Automatic Classification of Synthetic Voices for Voice Banking Using Objective Measures

    Get PDF
    Speech is the most common way of communication among humans. People who cannot communicate through speech due to partial of total loss of the voice can benefit from Alternative and Augmentative Communication devices and Text to Speech technology. One problem of using these technologies is that the included synthetic voices might be impersonal and badly adapted to the user in terms of age, accent or even gender. In this context, the use of synthetic voices from voice banking systems is an attractive alternative. New voices can be obtained applying adaptation techniques using recordings from people with healthy voice (donors) or from the user himself/herself before losing his/her own voice. In this way, the goal is to offer a wide voice catalog to potential users. However, as there is no control over the recording or the adaptation processes, some method to control the final quality of the voice is needed. We present the work developed to automatically select the best synthetic voices using a set of objective measures and a subjective Mean Opinion Score evaluation. A prediction algorithm of the MOS has been build which correlates similarly to the most correlated individual measure.This work has been funded by the Basque Government under the project ref. PIBA 2018-035 and IT-1355-19. This work is part of the project Grant PID 2019-108040RB-C21 funded by MCIN/AEI/10.13039/501100011033

    Synthetic speech detection using phase information

    Get PDF
    Taking advantage of the fact that most of the speech processing techniques neglect the phase information, we seek to detect phase perturbations in order to prevent synthetic impostors attacking Speaker Verification systems. Two Synthetic Speech Detection (SSD) systems that use spectral phase related information are reviewed and evaluated in this work: one based on the Modified Group Delay (MGD), and the other based on the Relative Phase Shift, (RPS). A classical module-based MFCC system is also used as baseline. Different training strategies are proposed and evaluated using both real spoofing samples and copy-synthesized signals from the natural ones, aiming to alleviate the issue of getting real data to train the systems. The recently published ASVSpoof2015 database is used for training and evaluation. Performance with completely unrelated data is also checked using synthetic speech from the Blizzard Challenge as evaluation material. The results prove that phase information can be successfully used for the SSD task even with unknown attacks.This work has been partially supported by the Basque Government (ElkarOla Project, KK-2015/00,098) and the Spanish Ministry of Economy and Competitiveness (Restore project, TEC2015-67,163-C2-1-R)

    The observation likelihood of silence: analysis and prospects for VAD applications

    Get PDF
    This paper shows a research on the behaviour of the observa-tion likelihoods generated by the central state of asilenceHMM(Hidden Markov Model) trained for Automatic Speech Recog-nition (ASR) using cepstral mean and variance normalization(CMVN). We have seen that observation likelihood shows astable behaviour under different recording conditions, and thischaracteristic can be used to discriminate betweenspeechandsilenceframes. We present several experiments which provethat the mere use of a decision threshold produces robust re-sults for very different recording channels and noise conditions.The results have also been compared with those obtained by twostandard VAD systems, showing promising prospects. All in all,observation likelihood scores could be useful as the basis for thedevelopment of future VAD systems, with further research andanalysis to refine the results.This work has been partially supported by the EU(FEDER) under grant TEC2015-67163-C2-1-R (RESTORE)(MINECO/FEDER, UE) and by the Basque Government undergrant KK-2017/00043 (BerbaOla

    Ahots sintetiko pertsonalizatuak: esperientzia baten deskribapena

    Get PDF
    Ahotsa ezinbestekoa da giza komunikaziorako, eta haren galerak eragin handia du pertsonak gizartean integratzeko prozesuan. Testu-ahots bihurketak ahots sintetikoa eman diezaieke ahozko desgaitasuna duten pertsonei. Irtenbide arruntenek ahots estandarra izaten dute normalean, eta, horregatik, erabiltzaile batzuek zailtasunak dituzte beren burua ahots horrekin identifikatzeko. Horregatik, ahots sintetiko pertsonalizatuak sortu behar dira, eta ahozko desgaitasuna duten pertsonei ahots-katalogo bat eskaini behar zaie, beren beharretara egokitzen den ahots bat aukeratu ahal izan dezaten. ZureTTS proiektuaren helburua ahots pertsonalizatu horiek ematea da, bai gaztelaniaz, bai euskaraz. Ahotsa galduko duten pertsonek edo ahotsik ez dutenei ahotsa eman nahi dieten pertsona altruistek 100 esaldi grabatzen dituzte, AhoMyTTS web-atariaren bidez. Esaldi horiekin, egokitze-prozesu bat egiten da, grabaketako ahotsaren antzeko ahots sintetiko bat sortzeko. Erabiltzaileari sintesi-motor bat ematen zaio ahots pertsonalizatu horrekin batera, ahozko mezuak sortzea eskaintzen duten aplikazioetan erabiltzeko. Gainera, ahots-katalogo bat ere badago, grabaketarik egin ezin duen pertsona batek ahots horien artean gustukoena aukeratu dezan. 1.200 pertsonak baino gehiagok erabili dute sistema hori ahots pertsonalizatu bat lortzeko, eta haietatik 58 hautatu ditugu katalogoan sartzeko. Erabiltzaileei egindako inkestek erakusten dute gustura daudela ahots sintetikoaren hainbat alderdirekin: gehienen ustez, ahots sintetikoa jatorrizkoaren antzekoa da, atsegina eta argia, baina robotiko samarra. Lan honek garapen jasangarrirako 10. helburuari laguntzen dio, herrialde bakoitzaren barneko eta herrialdeen arteko desberdintasunak murriztuz. Era berean, garapen jasangarrirako 4. helburuari ere laguntzen dio, guztiontzako kalitatezko hezkuntza inklusiboa nahiz bidezkoa bermatzea errazten duten tresnak eskainiz.; The voice is so essential for human communication that its loss drastically affects the integration of people in society. Text-to-speech can provide a synthetic voice for people with oral disabilities. The most common solutions usually provide a standard voice, and users have difficulties to identify themselves with it. For this reason, we need to create personalized synthetic voices and offer a catalogue of voices to people with oral disabilities so that they can choose one that suits their needs. The objective of the ZureTTS project is to provide these personalized voices, both in Spanish and in Basque. Through the AhoMyTTS web portal, people who are going to lose their voice or altruistic people who want to provide voices to those who do not have it, record 100 carefully se-lected sentences. A synthetic voice with similar characteristics to the voice of the recording is generated by applying an adaptation process. The user is provided with a synthesis engine along with that personalized voice, so that they can use it in applications that require oral message generation. In addition, we offer a catalogue of voices to choose from if one is no longer able to record. More than 1,200 people have used the system to obtain a personalized voice and 58 of them have been selected to be included in the cata-logue. User surveys show user satisfaction with various aspects of the synthetic voice: most think that the synthetic voice is similar to the original, pleasant and clear, although a bit robotic. This work contributes mainly to goal 10 for sustainable development by re-ducing inequality within and among countries. It also contributes to goal 4 for sustainable development, providing tools that facilitate access for all to an inclusive, equitable and quality education

    IMPACT-Global Hip Fracture Audit: Nosocomial infection, risk prediction and prognostication, minimum reporting standards and global collaborative audit. Lessons from an international multicentre study of 7,090 patients conducted in 14 nations during the COVID-19 pandemic

    Get PDF

    Reconocimiento automático de emociones utilizando parámetros prosódicos

    Get PDF
    Este artículo presenta los experimentos realizados para identificar automáticamente la emoción en una base de datos de habla emocional en euskara. Se han construido tres clasificadores diferentes: uno utilizando características espectrales y GMM, otro con parámetros prosódicos y SVM y el último con características prosódicas y SVM. Se extrajeron 86 características prosódicas y posteriormente se aplicó un algoritmo para seleccionar los parámetros más relevantes. El mejor resultado se obtuvo con el primero de los clasificadores, que alcanzó una precisión del 98.4% cuando se utilizan 512 componentes gaussianas. El clasificador construido con los 6 parámetros prosódicos más relevantes alcanza una precisión del 92.3% a pesar de su simplicidad, demostrando que la información prosódica es de gran importancia para identificar emociones.This paper presents the experiments made to automatically identify emotion in an emotional speech database for Basque. Three different classifiers have been built: one using spectral features and GMM, other with prosodic features and SVM and the last one with prosodic features and GMM. 86 prosodic features were calculated and then an algorithm to select the most relevant ones was applied. The first classifier gives the best result with a 98.4% accuracy when using 512 mixtures, but the classifier built with the best 6 prosodic features achieves an accuracy of 92.3% in spite of its simplicity, showing that prosodic information is very useful to identify emotions.Este trabajo ha sido parcialmente financiado por el Ministerio de Ciencia y Tecnología (TIC2003-08382-C05-03) y la Universidad del País Vasco (UPV-0147.345-E-14895/2002)
    corecore