19 research outputs found

    Karaoker: Alignment-free singing voice synthesis with speech training data

    Full text link
    Existing singing voice synthesis models (SVS) are usually trained on singing data and depend on either error-prone time-alignment and duration features or explicit music score information. In this paper, we propose Karaoker, a multispeaker Tacotron-based model conditioned on voice characteristic features that is trained exclusively on spoken data without requiring time-alignments. Karaoker synthesizes singing voice following a multi-dimensional template extracted from a source waveform of an unseen speaker/singer. The model is jointly conditioned with a single deep convolutional encoder on continuous data including pitch, intensity, harmonicity, formants, cepstral peak prominence and octaves. We extend the text-to-speech training objective with feature reconstruction, classification and speaker identification tasks that guide the model to an accurate result. Except for multi-tasking, we also employ a Wasserstein GAN training scheme as well as new losses on the acoustic model's output to further refine the quality of the model.Comment: Submitted to INTERSPEECH 202

    Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification

    Full text link
    Emotion detection in textual data has received growing interest in recent years, as it is pivotal for developing empathetic human-computer interaction systems. This paper introduces a method for categorizing emotions from text, which acknowledges and differentiates between the diversified similarities and distinctions of various emotions. Initially, we establish a baseline by training a transformer-based model for standard emotion classification, achieving state-of-the-art performance. We argue that not all misclassifications are of the same importance, as there are perceptual similarities among emotional classes. We thus redefine the emotion labeling problem by shifting it from a traditional classification model to an ordinal classification one, where discrete emotions are arranged in a sequential order according to their valence levels. Finally, we propose a method that performs ordinal classification in the two-dimensional emotion space, considering both valence and arousal scales. The results show that our approach not only preserves high accuracy in emotion prediction but also significantly reduces the magnitude of errors in cases of misclassification

    Fine-grained Noise Control for Multispeaker Speech Synthesis

    Full text link
    A text-to-speech (TTS) model typically factorizes speech attributes such as content, speaker and prosody into disentangled representations.Recent works aim to additionally model the acoustic conditions explicitly, in order to disentangle the primary speech factors, i.e. linguistic content, prosody and timbre from any residual factors, such as recording conditions and background noise.This paper proposes unsupervised, interpretable and fine-grained noise and prosody modeling. We incorporate adversarial training, representation bottleneck and utterance-to-frame modeling in order to learn frame-level noise representations. To the same end, we perform fine-grained prosody modeling via a Fully Hierarchical Variational AutoEncoder (FVAE) which additionally results in more expressive speech synthesis.Comment: Accepted to INTERSPEECH 202

    Self-supervised learning for robust voice cloning

    Full text link
    Voice cloning is a difficult task which requires robust and informative features incorporated in a high quality TTS system in order to effectively copy an unseen speaker's voice. In our work, we utilize features learned in a self-supervised framework via the Bootstrap Your Own Latent (BYOL) method, which is shown to produce high quality speech representations when specific audio augmentations are applied to the vanilla algorithm. We further extend the augmentations in the training procedure to aid the resulting features to capture the speaker identity and to make them robust to noise and acoustic conditions. The learned features are used as pre-trained utterance-level embeddings and as inputs to a Non-Attentive Tacotron based architecture, aiming to achieve multispeaker speech synthesis without utilizing additional speaker features. This method enables us to train our model in an unlabeled multispeaker dataset as well as use unseen speaker embeddings to copy a speaker's voice. Subjective and objective evaluations are used to validate the proposed model, as well as the robustness to the acoustic conditions of the target utterance.Comment: Accepted to INTERSPEECH 202

    EVIDENCE-BASED HEALTH PROMOTION: EXPLORING THE EVOLUTION OF THE EFFECTIVENESS OF SCHOOL-BASED ANTI-BULLYING INTERVENTIONS OVER TIME

    No full text
    The objectives of this thesis were to explore how effectiveness of school-based anti-bullying interventions (SBABI) evolves over time and to assess the possibility to predict the medium-term or long-term effectiveness of SBABIs on the basis of their short-term effectiveness. The first step included a literature review in order to understand the study designs and evaluation techniques that researches used to assess the effectiveness. This literature review described the methodologies based on which researchers collected evidence and concluded on the effectiveness of their SBABIs. In order to address the thesis objectives, a collaborative project was established, named SET-Bullying (“Statistical modelling of the Effectiveness of school based anti-bullying interventions and Time”). The above-mentioned literature review was used to identify potentially eligible studies. After addressing a call for collaboration to the corresponding authors of these studies, this project included data from two of them, the DFE-SHEFFIELD study from United Kingdom and the RESPEKT study from Norway. Both of these studies have used pupil self-reported frequencies on being bullied and bullying others as an effectiveness measure, but using different instruments to elicit this information. Thus, the subsequent step of this thesis was to harmonize the data from these studies using polychoric principal components analysis, in order to be able to perform the same analysis with the data from both studies. The data from both studies were analysed using mixed effect models in order to take into account the hierarchical (i.e. the responses of pupils from the same school may be more correlated with each other as opposed to the responses of pupils from different schools) and the longitudinal structure (i.e. same pupils are more likely to respond in a similar way in the repeated measurements of each studies) of the data. With regard to the primary objective of the thesis, it was observed that effectiveness (where it is observed) may evolve either in a linear fashion or a “delayed effect” may be observed. This refers to a minimal evolution of effectiveness over the first study measurements and a sharper evolution at the later study measurements. This finding is only hypothesis generating at this point. Would this be confirmed in future studies, it will have important implication of the design, implementation and evaluations of SBABIs. About the secondary objective of this thesis, there were some preliminary findings of the possibility to predict the medium-term or long-term effectiveness based on the short-term effectiveness. However, these predictions in some cases seemed to be very variable. Future research should focus on how to make these predictions more accurate in order that this allows for dynamic and adaptable delivery of SBABIs.Doctorat en Santé Publiqueinfo:eu-repo/semantics/nonPublishe

    The correlation structure of FX option markets before and since the financial crisis

    No full text
    The liquidity crunch and the ensuing financial crisis have unambiguously affected all national economies and global currency exchange rates. In this article we ask whether the cross-currency correlation structure has changed since 2007. Using an extensive set of volatility surfaces implied from over-the-counter options on 11 different exchange rates, as well as recent advances in static and dynamic factor models, we are able to show that the number of factors that innovate the correlation structure has not changed in the last two and a half years. It is the volatility, the persistence and the significance of global systematic factors, vis-a-vis regional or economy-specific ones, that appear to have changed dramatically. The implications for the risk management of currency exposures and for the predictability of exchange rate volatility are also outlined.

    Predictability in implied volatility surfaces:evidence from the Euro OTC FX market

    No full text
    Recent general equilibrium models prescribe predictable dynamics in the volatility surfaces that are implied by observed option prices. In this paper, we investigate the predictability of surfaces, using extensive time series of implied volatilities from over-the-counter options on eight different currencies, quoted against the Euro. We examine implied volatility surfaces in the context of predictability through three different models, two that employ parametric specifications to describe the surface and one that decomposes it into latent statistical factors. All examined models are shown to (a) accurately describe the surfaces in-sample, and (b) produce forecasts that are superior to hard-to-beat benchmarks that ignore information about the shape of the surface, in medium- to long-term horizons. We show that these forecasts can support profitable volatility trading strategies in the absence of transaction costs. Comparing across competing models, our results suggest that parametric models, that allow for a more structured description of the surface, are more successful in terms of forecasts’ accuracy and significance of trading profits

    How important is the term structure in implied volatility modelling:evidence from foreign exchange options

    No full text
    We claim that previously proposed parametric specifications that linearly approximate the term structure of the implied volatility surface (IVS) in option prices fail to capture important information regarding the expectations of market participants. This paper proposes a parametric specification for describing the IVS that allows flexible modeling of the term structure through a Nelson and Siegel (1987) factorization, recently proposed by Diebold and Li (2006) in the context of yield curve modeling. The specification is tested on implied volatilities from the over-the-counter foreign exchange options market, where contracts with long expiries are actively traded and thus the term structure dimension of the surface should be very important. We first show that the proposed volatility specification can consistently and remarkably improve our ability to describe the surface on any given day. We then establish the economic relevance of the incremental information captured by our proposed specification by showing that it can produce more accurate forecasts of implied volatility that can support long-term profitable trading strategies in the absence of transaction costs

    School-based anti-bullying interventions: Systematic review of the methodology to assess their effectiveness

    No full text
    Background: Knowledge and collective experience on the evaluation of anti-bullying interventions are spread across literature. Gathering it would contribute toward evidence-based anti-bullying interventions. This paper presents the results of a systematic literature review of the research methodology of school-based anti-bullying interventions (SBABIs). Methods: Articles were identified using the word "bullying" either as keyword or subject heading on MEDLINE, PSYCINFO and ERIC. Search engine limitations were also used in order to identify eligible articles evaluating SBABI in childhood and adolescents. Further selection was based on information through titles and abstracts and full text for some articles. Content analysis of words, phrases or extracts accordingly to some pre-specified criteria was used. Results: Results present research methodologies used in terms of evaluation research designs, number of study groups, collected information and the way information was collected, methodology used for analysis and strengths and limitations identified by researchers concerning their research methodology. Conclusion: A great variability of research methodologies was observed. We suggest the adoption of a framework of research phases, suggested by someone else, to frame this variability on a continuum toward building evidence. Additionally, based on recommendation suggested by others, we discuss issues of internal and external validity of the evaluation methodologies. These three suggestions help to frame and enhance evaluation practices in bullying research.SCOPUS: re.jinfo:eu-repo/semantics/publishe
    corecore