Search CORE

64 research outputs found

Karaoker: Alignment-free singing voice synthesis with speech training data

Author: Chalamandaris Aimilios
Ellinas Nikolaos
Jho Gunu
Kakoulidis Panos
Markopoulos Konstantinos
Sung June Sig
Tsiakoulis Pirros
Vamvoukakis Georgios
Publication venue
Publication date: 08/04/2022
Field of study

Existing singing voice synthesis models (SVS) are usually trained on singing data and depend on either error-prone time-alignment and duration features or explicit music score information. In this paper, we propose Karaoker, a multispeaker Tacotron-based model conditioned on voice characteristic features that is trained exclusively on spoken data without requiring time-alignments. Karaoker synthesizes singing voice following a multi-dimensional template extracted from a source waveform of an unseen speaker/singer. The model is jointly conditioned with a single deep convolutional encoder on continuous data including pitch, intensity, harmonicity, formants, cepstral peak prominence and octaves. We extend the text-to-speech training objective with feature reconstruction, classification and speaker identification tasks that guide the model to an accurate result. Except for multi-tasking, we also employ a Wasserstein GAN training scheme as well as new losses on the acoustic model's output to further refine the quality of the model.Comment: Submitted to INTERSPEECH 202

arXiv.org e-Print Archive

Predicting phoneme-level prosody latents using AR and flow-based Prior Networks for expressive speech synthesis

Author: Chalamandaris Aimilios
Ellinas Nikolaos
Hwang Inchul
Klapsas Konstantinos
Nikitaras Karolos
Raptis Spyros
Sung June Sig
Tsiakoulis Pirros
Publication venue
Publication date: 02/11/2022
Field of study

A large part of the expressive speech synthesis literature focuses on learning prosodic representations of the speech signal which are then modeled by a prior distribution during inference. In this paper, we compare different prior architectures at the task of predicting phoneme level prosodic representations extracted with an unsupervised FVAE model. We use both subjective and objective metrics to show that normalizing flow based prior networks can result in more expressive speech at the cost of a slight drop in quality. Furthermore, we show that the synthesized speech has higher variability, for a given text, due to the nature of normalizing flows. We also propose a Dynamical VAE model, that can generate higher quality speech although with decreased expressiveness and variability compared to the flow based models.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis

Author: Chalamandaris Aimilios
Ellinas Nikolaos
Hwang Inchul
Klapsas Konstantinos
Maniati Georgia
Nikitaras Karolos
Raptis Spyros
Sung June Sig
Tsiakoulis Pirros
Publication venue
Publication date: 01/11/2022
Field of study

This paper proposes an Expressive Speech Synthesis model that utilizes token-level latent prosodic variables in order to capture and control utterance-level attributes, such as character acting voice and speaking style. Current works aim to explicitly factorize such fine-grained and utterance-level speech attributes into different representations extracted by modules that operate in the corresponding level. We show that the fine-grained latent space also captures coarse-grained information, which is more evident as the dimension of latent space increases in order to capture diverse prosodic representations. Therefore, a trade-off arises between the diversity of the token-level and utterance-level representations and their disentanglement. We alleviate this issue by first capturing rich speech attributes into a token-level latent space and then, separately train a prior network that given the input text, learns utterance-level representations in order to predict the phoneme-level, posterior latents extracted during the previous step. Both qualitative and quantitative evaluations are used to demonstrate the effectiveness of the proposed approach. Audio samples are available in our demo page.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

Fine-grained Noise Control for Multispeaker Speech Synthesis

Author: Chalamandaris Aimilios
Ellinas Nikolaos
Jho Gunu
Klapsas Konstantinos
Markopoulos Konstantinos
Nikitaras Karolos
Raptis Spyros
Sung June Sig
Tsiakoulis Pirros
Vamvoukakis Georgios
Publication venue: 'International Speech Communication Association'
Publication date: 27/10/2022
Field of study

A text-to-speech (TTS) model typically factorizes speech attributes such as content, speaker and prosody into disentangled representations.Recent works aim to additionally model the acoustic conditions explicitly, in order to disentangle the primary speech factors, i.e. linguistic content, prosody and timbre from any residual factors, such as recording conditions and background noise.This paper proposes unsupervised, interpretable and fine-grained noise and prosody modeling. We incorporate adversarial training, representation bottleneck and utterance-to-frame modeling in order to learn frame-level noise representations. To the same end, we perform fine-grained prosody modeling via a Fully Hierarchical Variational AutoEncoder (FVAE) which additionally results in more expressive speech synthesis.Comment: Accepted to INTERSPEECH 202

arXiv.org e-Print Archive

Self-supervised learning for robust voice cloning

Author: Chalamandaris Aimilios
Ellinas Nikolaos
Jho Gunu
Kakoulidis Panos
Klapsas Konstantinos
Markopoulos Konstantinos
Nikitaras Karolos
Raptis Spyros
Sung June Sig
Tsiakoulis Pirros
Vamvoukakis Georgios
Publication venue
Publication date: 02/11/2022
Field of study

Voice cloning is a difficult task which requires robust and informative features incorporated in a high quality TTS system in order to effectively copy an unseen speaker's voice. In our work, we utilize features learned in a self-supervised framework via the Bootstrap Your Own Latent (BYOL) method, which is shown to produce high quality speech representations when specific audio augmentations are applied to the vanilla algorithm. We further extend the augmentations in the training procedure to aid the resulting features to capture the speaker identity and to make them robust to noise and acoustic conditions. The learned features are used as pre-trained utterance-level embeddings and as inputs to a Non-Attentive Tacotron based architecture, aiming to achieve multispeaker speech synthesis without utilizing additional speaker features. This method enables us to train our model in an unlabeled multispeaker dataset as well as use unseen speaker embeddings to copy a speaker's voice. Subjective and objective evaluations are used to validate the proposed model, as well as the robustness to the acoustic conditions of the target utterance.Comment: Accepted to INTERSPEECH 202

arXiv.org e-Print Archive

Will the US Economy Recover in 2010? A Minimal Spanning Tree Study

Author: Andersen
Andersen
Andrews
Azad
Bai
Bai
Bai
Bai
Bai
Bai
Bai
Bai
Baker
Baker
Barberis
Barndorff-Nielsen
Baxter
Bernaola-Galván
Boginski
Bollerslev
Bonanno
Bonanno
Bonanno
Bordo
Borghesi
Brida
Brida
Brida
Camacho
Canova
Carrion-i-Silvestre
Carrion-i-Silvestre
Chalamandaris
Chauvet
Chauvet
Cheong
Cheong
Cheung
Chong
Chong
Claessens
Claessens
Coelho
Coelho
Coronnello
Croux
Dacorogna
Di Matteo
Duch
Eom
Eom
Eryiğit
Fama
Forbes
Fortunato
Gilmore
Girvan
Gladys Hui Ting Lee
Gligor
Goldfeld
Graham
Guo
Hamilton
Hansen
Heimo
Hill
Hill
Hill
Im
Jacod
Jain
Jian Cheng Wong
Johnson
Jun Liang Kok
Jung
Jung
Karolyi
Kim
Koutrouvelis
Koutrouvelis
Krawiecki
Kruskal
Kulkarni
Kullmann
Laloux
Lancichinetti
Lavielle
Li
Lin
Lo
Loader
Lumsdaine
Mamun
Manamohan Prusty
Mantegna
Mantegna
Marcellino
McCulloch
McCulloch
Miccichè
Mirestean
Miśkiewicz
Newman
Newman
Nolan
Nolan
Nolan
Nolan
Onnela
Onnela
Onnela
Onnela
Panton
Panton
Perron
Plerou
Pozzi
Prim
Reichardt
Rigobon
Román-Roldán
Schwert
Shen
Siew Ann Cheong
Simpson
Sneath
Stockman
Tamirisa
Taylor
Tola
Tudor
Tumminello
Tumminello
Tumminello
Utsugi
Vaglica
Wilcox
Wilcox
Wong
Wong
Worsdale
Yiting Zhang
Zivot
Zolotarev
Çukur
Publication venue: 'Elsevier BV'
Publication date: 20/12/2010
Field of study

We calculated the cross correlations between the half-hourly times series of the ten Dow Jones US economic sectors over the period February 2000 to August 2008, the two-year intervals 2002--2003, 2004--2005, 2008--2009, and also over 11 segments within the present financial crisis, to construct minimal spanning trees (MSTs) of the US economy at the sector level. In all MSTs, a core-fringe structure is found, with consumer goods, consumer services, and the industrials consistently making up the core, and basic materials, oil and gas, healthcare, telecommunications, and utilities residing predominantly on the fringe. More importantly, we find that the MSTs can be classified into two distinct, statistically robust, topologies: (i) star-like, with the industrials at the center, associated with low-volatility economic growth; and (ii) chain-like, associated with high-volatility economic crisis. Finally, we present statistical evidence, based on the emergence of a star-like MST in Sep 2009, and the MST staying robustly star-like throughout the Greek Debt Crisis, that the US economy is on track to a recovery.Comment: elsarticle class, includes amsmath.sty, graphicx.sty and url.sty. 68 pages, 16 figures, 8 tables. Abridged version of the manuscript presented at the Econophysics Colloquim 2010, incorporating reviewer comment

arXiv.org e-Print Archive

CiteSeerX

Crossref

Predictable dynamics in implied volatility surfaces from OTC currency options

Author: Andreou
Andrianos E. Tsekrekos
Bai
Bai
Bai
Banerjee
Becker
Bedendo
Black
Boivin
Campa
Campa
Canina
Carr
Carr
Chalamandaris
Chamberlain
Christoffersen
Connor
Cont
Derman
Diebold
Diebold
Dumas
Forni
Garcia
Garman
Georgios Chalamandaris
Gonçalves
Guidolin
Harvey
Heston
Heynen
Im
Kellard
Konstantinidi
Levin
Malz
Merton
Mixon
Peña
Pong
Rubinstein
Schwarz
Skiadopoulos
Stock
Taylor
Tompkins
Wilson
Xu
Yu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Radiotherapy combined with nivolumab or temozolomide for newly diagnosed glioblastoma with unmethylated MGMT promoter: An international randomized phase III trial

Author: Baehring J.
Bahr O.
Brandes A. A.
Butowski N.
Carpentier A. F.
Chalamandaris A. -G.
Cloughesy T.
Di Giacomo A. M.
Fu A. Z.
Idbaih A.
Khasraw M.
Lassen U.
Lim M.
Liu Y.
Lombardi G.
Mulholland P.
Muragaki Y.
Omuro A.
Potter V.
Qian X.
Reardon D. A.
Roth P.
Sepulveda J. M.
Sumrall A.
Tabatabai G.
Tatsuoka K.
van den Bent M.
Vauleon E.
Weller M.
Publication venue
Publication date: 01/01/2023
Field of study

BACKGROUND: Addition of temozolomide (TMZ) to radiotherapy (RT) improves overall survival (OS) in patients with glioblastoma (GBM), but previous studies suggest that patients with tumors harboring an unmethylated MGMT promoter derive minimal benefit. The aim of this open-label, phase III CheckMate 498 study was to evaluate the efficacy of nivolumab (NIVO) + RT compared with TMZ + RT in newly diagnosed GBM with unmethylated MGMT promoter. METHODS: Patients were randomized 1:1 to standard RT (60 Gy) + NIVO (240 mg every 2 weeks for eight cycles, then 480 mg every 4 weeks) or RT + TMZ (75 mg/m2 daily during RT and 150-200 mg/m2/day 5/28 days during maintenance). The primary endpoint was OS. RESULTS: A total of 560 patients were randomized, 280 to each arm. Median OS (mOS) was 13.4 months (95% CI, 12.6 to 14.3) with NIVO + RT and 14.9 months (95% CI, 13.3 to 16.1) with TMZ + RT (hazard ratio [HR], 1.31; 95% CI, 1.09 to 1.58; P = .0037). Median progression-free survival was 6.0 months (95% CI, 5.7 to 6.2) with NIVO + RT and 6.2 months (95% CI, 5.9 to 6.7) with TMZ + RT (HR, 1.38; 95% CI, 1.15 to 1.65). Response rates were 7.8% (9/116) with NIVO + RT and 7.2% (8/111) with TMZ + RT; grade 3/4 treatment-related adverse event (TRAE) rates were 21.9% and 25.1%, and any-grade serious TRAE rates were 17.3% and 7.6%, respectively. CONCLUSIONS: The study did not meet the primary endpoint of improved OS; TMZ + RT demonstrated a longer mOS than NIVO + RT. No new safety signals were detected with NIVO in this study. The difference between the study treatment arms is consistent with the use of TMZ + RT as the standard of care for GBM.ClinicalTrials.gov NCT02617589

Archivio della Ricerca - Università degli Studi di Siena

Liquidity risk in spot foreign exchange markets

Author: Chalamandaris G.
Publication venue
Publication date: 01/01/2000
Field of study

SIGLEAvailable from British Library Document Supply Centre-DSC:DXN038509 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

OpenGrey Repository

EVIDENCE-BASED HEALTH PROMOTION: EXPLORING THE EVOLUTION OF THE EFFECTIVENESS OF SCHOOL-BASED ANTI-BULLYING INTERVENTIONS OVER TIME

Author: Chalamandaris Alexandros-Georgios
Publication venue: Université libre de Bruxelles, Ecole de Santé publique, Bruxelles
Publication date: 09/05/2018
Field of study

The objectives of this thesis were to explore how effectiveness of school-based anti-bullying interventions (SBABI) evolves over time and to assess the possibility to predict the medium-term or long-term effectiveness of SBABIs on the basis of their short-term effectiveness. The first step included a literature review in order to understand the study designs and evaluation techniques that researches used to assess the effectiveness. This literature review described the methodologies based on which researchers collected evidence and concluded on the effectiveness of their SBABIs. In order to address the thesis objectives, a collaborative project was established, named SET-Bullying (“Statistical modelling of the Effectiveness of school based anti-bullying interventions and Time”). The above-mentioned literature review was used to identify potentially eligible studies. After addressing a call for collaboration to the corresponding authors of these studies, this project included data from two of them, the DFE-SHEFFIELD study from United Kingdom and the RESPEKT study from Norway. Both of these studies have used pupil self-reported frequencies on being bullied and bullying others as an effectiveness measure, but using different instruments to elicit this information. Thus, the subsequent step of this thesis was to harmonize the data from these studies using polychoric principal components analysis, in order to be able to perform the same analysis with the data from both studies. The data from both studies were analysed using mixed effect models in order to take into account the hierarchical (i.e. the responses of pupils from the same school may be more correlated with each other as opposed to the responses of pupils from different schools) and the longitudinal structure (i.e. same pupils are more likely to respond in a similar way in the repeated measurements of each studies) of the data. With regard to the primary objective of the thesis, it was observed that effectiveness (where it is observed) may evolve either in a linear fashion or a “delayed effect” may be observed. This refers to a minimal evolution of effectiveness over the first study measurements and a sharper evolution at the later study measurements. This finding is only hypothesis generating at this point. Would this be confirmed in future studies, it will have important implication of the design, implementation and evaluations of SBABIs. About the secondary objective of this thesis, there were some preliminary findings of the possibility to predict the medium-term or long-term effectiveness based on the short-term effectiveness. However, these predictions in some cases seemed to be very variable. Future research should focus on how to make these predictions more accurate in order that this allows for dynamic and adaptable delivery of SBABIs.Doctorat en Santé Publiqueinfo:eu-repo/semantics/nonPublishe

DI-fusion