22 research outputs found

    Towards Speaking Style Transplantation in Speech Synthesis

    Get PDF
    One of the biggest challenges in speech synthesis is the production of naturally sounding synthetic voices. This means that the resulting voice must be not only of high enough quality but also that it must be able to capture the natural expressiveness imbued in human speech. This paper focus on solving the expressiveness problem by proposing a set of different techniques that could be used for extrapolating the expressiveness of proven high quality speaking style models into neutral speakers in HMM-based synthesis. As an additional advantage, the proposed techniques are based on adaptation approaches, which means that they can be used with little training data (around 15 minutes of training data are used in each style for this paper). For the final implementation, a set of 4 speaking styles were considered: news broadcasts, live sports commentary, interviews and parliamentary speech. Finally, the implementation of the 5 techniques were tested through a perceptual evaluation that proves that the deviations between neutral and speaking style average models can be learned and used to imbue expressiveness into target neutral speakers as intended. Index Terms: expressive speech synthesis, speaking styles, adaptation, expressiveness transplantatio

    Generacion de una voz sintetica en Castellano basada en HSMM para la Evaluacion Albayzin 2008: conversion texto a voz

    Get PDF
    Este artículo describe el proceso de generación de una voz en castellano utilizando el corpus UPC ESMA de UPC proporcionado por la Evaluación Albayzín 2008: Conversión Texto a Voz. Se ha implementado una voz basada en selección de unidades mediante el paquete Multisyn de Festival y otra basada en Hidden Semi-Markov Models (HSMM) mediante HTS. Tras una breve evaluación de la calidad de ambas voces, se detallan las características principales de la voz basada en HSMM, sistema final presentado a la evaluación

    Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification

    Get PDF
    As an essential approach to understanding human interactions, emotion classification is a vital component of behavioral studies as well as being important in the design of context-aware systems. Recent studies have shown that speech contains rich information about emotion, and numerous speech-based emotion classification methods have been proposed. However, the classification performance is still short of what is desired for the algorithms to be used in real systems. We present an emotion classification system using several one-against-all support vector machines with a thresholding fusion mechanism to combine the individual outputs, which provides the functionality to effectively increase the emotion classification accuracy at the expense of rejecting some samples as unclassified. Results show that the proposed system outperforms three state-of-the-art methods and that the thresholding fusion mechanism can effectively improve the emotion classification, which is important for applications that require very high accuracy but do not require that all samples be classified. We evaluate the system performance for several challenging scenarios including speaker-independent tests, tests on noisy speech signals, and tests using non-professional acted recordings, in order to demonstrate the performance of the system and the effectiveness of the thresholding fusion mechanism in real scenarios.Peer ReviewedPreprin

    [multi’vocal]: reflections on engaging everyday people in the development of a collective non-binary synthesized voice

    Get PDF
    The growing field of Human-Computer Interaction (HCI) takes a step out from conventional screenbased interactions, creating new scenarios, in which voice synthesis and voice recognition become important elements. Such voices are commonly created through concatenative or parametric synthesis methods, which access large voice corpora, pre-recorded by a single professional voice actor. These designed voices arguably propagate representations of gender binary identities. In this paper we present our project, [multi’vocal], which aims to challenge the current gender binary representations in synthesized voices. More specifically we explore if it is possible to create a non-binary synthesized voice through engaging everyday people of diverse backgrounds in giving voice to a collective synthesized voice of all genders, ages and accents

    Elective cancer surgery in COVID-19-free surgical pathways during the SARS-CoV-2 pandemic: An international, multicenter, comparative cohort study

    Get PDF
    PURPOSE As cancer surgery restarts after the first COVID-19 wave, health care providers urgently require data to determine where elective surgery is best performed. This study aimed to determine whether COVID-19–free surgical pathways were associated with lower postoperative pulmonary complication rates compared with hospitals with no defined pathway. PATIENTS AND METHODS This international, multicenter cohort study included patients who underwent elective surgery for 10 solid cancer types without preoperative suspicion of SARS-CoV-2. Participating hospitals included patients from local emergence of SARS-CoV-2 until April 19, 2020. At the time of surgery, hospitals were defined as having a COVID-19–free surgical pathway (complete segregation of the operating theater, critical care, and inpatient ward areas) or no defined pathway (incomplete or no segregation, areas shared with patients with COVID-19). The primary outcome was 30-day postoperative pulmonary complications (pneumonia, acute respiratory distress syndrome, unexpected ventilation). RESULTS Of 9,171 patients from 447 hospitals in 55 countries, 2,481 were operated on in COVID-19–free surgical pathways. Patients who underwent surgery within COVID-19–free surgical pathways were younger with fewer comorbidities than those in hospitals with no defined pathway but with similar proportions of major surgery. After adjustment, pulmonary complication rates were lower with COVID-19–free surgical pathways (2.2% v 4.9%; adjusted odds ratio [aOR], 0.62; 95% CI, 0.44 to 0.86). This was consistent in sensitivity analyses for low-risk patients (American Society of Anesthesiologists grade 1/2), propensity score–matched models, and patients with negative SARS-CoV-2 preoperative tests. The postoperative SARS-CoV-2 infection rate was also lower in COVID-19–free surgical pathways (2.1% v 3.6%; aOR, 0.53; 95% CI, 0.36 to 0.76). CONCLUSION Within available resources, dedicated COVID-19–free surgical pathways should be established to provide safe elective cancer surgery during current and before future SARS-CoV-2 outbreaks

    Elective Cancer Surgery in COVID-19-Free Surgical Pathways During the SARS-CoV-2 Pandemic: An International, Multicenter, Comparative Cohort Study.

    Get PDF
    PURPOSE: As cancer surgery restarts after the first COVID-19 wave, health care providers urgently require data to determine where elective surgery is best performed. This study aimed to determine whether COVID-19-free surgical pathways were associated with lower postoperative pulmonary complication rates compared with hospitals with no defined pathway. PATIENTS AND METHODS: This international, multicenter cohort study included patients who underwent elective surgery for 10 solid cancer types without preoperative suspicion of SARS-CoV-2. Participating hospitals included patients from local emergence of SARS-CoV-2 until April 19, 2020. At the time of surgery, hospitals were defined as having a COVID-19-free surgical pathway (complete segregation of the operating theater, critical care, and inpatient ward areas) or no defined pathway (incomplete or no segregation, areas shared with patients with COVID-19). The primary outcome was 30-day postoperative pulmonary complications (pneumonia, acute respiratory distress syndrome, unexpected ventilation). RESULTS: Of 9,171 patients from 447 hospitals in 55 countries, 2,481 were operated on in COVID-19-free surgical pathways. Patients who underwent surgery within COVID-19-free surgical pathways were younger with fewer comorbidities than those in hospitals with no defined pathway but with similar proportions of major surgery. After adjustment, pulmonary complication rates were lower with COVID-19-free surgical pathways (2.2% v 4.9%; adjusted odds ratio [aOR], 0.62; 95% CI, 0.44 to 0.86). This was consistent in sensitivity analyses for low-risk patients (American Society of Anesthesiologists grade 1/2), propensity score-matched models, and patients with negative SARS-CoV-2 preoperative tests. The postoperative SARS-CoV-2 infection rate was also lower in COVID-19-free surgical pathways (2.1% v 3.6%; aOR, 0.53; 95% CI, 0.36 to 0.76). CONCLUSION: Within available resources, dedicated COVID-19-free surgical pathways should be established to provide safe elective cancer surgery during current and before future SARS-CoV-2 outbreaks

    Analysis of the alignment configuration in a statistical translation system of Spanish into Spanish Sign Language (LSE)

    Get PDF
    La principal aportación de este artículo es el estudio del efecto que tiene el tipo de alineamiento en un sistema de traducción estadística de castellano a Lengua de Signos Española (LSE). El sistema de traducción utiliza un modelo de traducción basado en subfrases o secuencias de palabras. El artículo describe el ajuste de los parámetros de configuración de este sistema para el problema de traducción concreto (castellano-LSE), siendo la selección del tipo de alineamiento un aspecto crítico en los resultados de traducción obtenidos. La selección del tipo de alineamiento se define en el proceso de generación del modelo de traducción basado en palabras como paso previo a la generación del modelo de secuencias de palabras. La evaluación de la arquitectura se realiza con varias métricas: WER (tasa de error de palabras), BLEU (“BiLingual Evaluation Understudy”) y NIST. Finalmente, los resultados que se obtienen dan una tasa de error de 28,29%, consiguiendo una reducción relativa de más de un 35% en dicha tasa de error.The main aspect of this paper is the effect analysis of the alignment configuration in a statistical Spanish into Spanish Sign Language translation system. The translation system uses a phrase-based translation model. This paper describes the system configuration adapted for the specific translation problem (Spanish-LSE). In this configuration, the type of alignment is a critical aspect for the system performance. This alignment is used during the process of word-based translation model generation, preliminary step for generating the phrase-based translation model (finally used in translation). The translation system has been evaluated with several metrics: WER (Word Error Rate), BLEU (“BiLingual Evaluation Understudy”) and NIST. Finally, the results show a word error rate relative reduction of more than 35% obtaining a final 28.29% WER.Este trabajo ha sido financiado por: Plan Avanza Exp No: PAV-070000-2007-567, ROBONAUTA (DPI2007-66846-c02-02) y SD-TEAM (TIN2008-06856-C05-03)
    corecore