    The limits of the Mean Opinion Score for speech synthesis evaluation

    The release of WaveNet and Tacotron has forever transformed the speech synthesis landscape. Thanks to these game-changing innovations, the quality of synthetic speech has reached unprecedented levels. However, to measure this leap in quality, an overwhelming majority of studies still rely on the Absolute Category Rating (ACR) protocol and compare systems using its output; the Mean Opinion Score (MOS). This protocol is not without controversy, and as the current state-of-the-art synthesis systems now produce outputs remarkably close to human speech, it is now vital to determine how reliable this score is.To do so, we conducted a series of four experiments replicating and following the 2013 edition of the Blizzard Challenge. With these experiments, we asked four questions about the MOS: How stable is the MOS of a system across time? How do the scores of lower quality systems influence the MOS of higher quality systems? How does the introduction of modern technologies influence the scores of past systems? How does the MOS of modern technologies evolve in isolation?The results of our experiments are manyfold. Firstly, we verify the superiority of modern technologies in comparison to historical synthesis. Then, we show that despite its origin as an absolute category rating, MOS is a relative score. While minimal variations are observed during the replication of the 2013-EH2 task, these variations can still lead to different conclusions for the intermediate systems. Our experiments also illustrate the sensitivity of MOS to the presence/absence of lower and higher anchors. Overall, our experiments suggest that we may have reached the end of a cul-de-sac by only evaluating the overall quality with MOS. We must embark on a new road and develop different evaluation protocols better suited to the analysis of modern speech synthesis technologies

    Liaison and pronunciation learning in end-to-end text-to-speech in French

    Valeria Piacentini Fiorani. « A Sasanian fleet or a maritime system? »

    L’A. questionne la prĂ©sence d’une flotte sassanide dans le golfe Persique dans la seconde moitiĂ© du VIIe s.. Pour cela, l’auteure se base sur les sources arabes des IIIe/IXe-VIIe/XIIIe siĂšcles, en particulier les chroniques de la conquĂȘte arabe (Kutub al-FutĆ«áž„) et une chronique rĂ©digĂ©e en persan relatant les conquĂȘtes plus orientales, jusqu’au Makran (Fatáž„nāmah-i Sind). Interrogeant les silences et les sous-entendus de ces rĂ©cits, l’auteure dĂ©montre que les eaux du golfe Persique sont toujour..

    Recherche d'information médicale pour le patient Impact de ressources terminologiques

    National audienceABSTRACT. The right of patients to access their clinical health record is granted by the code of SantĂ© Publique. Yet, this content remain difficult to understand. We propose an experience, in which we use queries defined by patients in order to find relevant documents. We utilise the Indri search engine, based on statistical language modeling and semantic resources. We stress the point related to the terminological variation (e.g. synonyms, abbreviations) to make the link between expert and patient languages. Various combinations of resources and Indri settings are explored, mostly based on query expansion. Our system shows up to 0.7660 P@10 and up to 0.6793 [email protected]ÉSUMÉ. Le droit d'accĂšs au dossier clinique par les patients est inscrit dans le code de SantĂ© Publique. Cependant, ce contenu reste difficile Ă  comprendre. Nous proposons une expĂ©rience, oĂč les requĂȘtes des patients sont utilisĂ©es pour retrouver les documents pertinents. Nous util-isons le moteur de recherche Indri, basĂ© sur le modĂšle statistique de la langue, et des ressources sĂ©mantiques. L'accent est mis sur la variation terminologique (e.g. synonymes, abrĂ©viations) pour faire le lien entre la langue des experts et des patients. DiffĂ©rentes combinaisons de ressources et du paramĂ©trage de Indri sont testĂ©es, essentiellement Ă  travers l'expansion des requĂȘtes. Notre systĂšme montre jusqu'Ă  0,7660 de P@10 et 0,6793 de NDCG@10

    Claire Hardy-Guilbert, HĂ©lĂšne Renel, Axelle Rougeulle, Eric Vallet (Ă©ds.). Sur les chemins d’Onagre : Histoire et archĂ©ologie orientales : Hommage Ă  Monik Kervran

    Sur les chemins d’Onagre rend hommage Ă  Monik Kervran, archĂ©ologue et directrice de recherche Ă©mĂ©rite au CNRS, qui a largement contribuĂ© Ă  l’avancement des connaissances de l’histoire de l’Islam oriental, en particulier en Iran et dans le Golfe. Les dix-neuf articles rĂ©unis tĂ©moignent de la diversitĂ© des thĂšmes de recherches sur lesquels elle a travaillĂ©. En effet, ils traitent aussi bien de l’époque sassanide qu’islamique, et abordent des sujets variĂ©s tels que l’architecture, la cĂ©ramique, ..

    Phonetic accommodation in interaction with a virtual language learning tutor: A Wizard-of-Oz study

    We present a Wizard-of-Oz experiment examining phonetic accommodation of human interlocutors in the context of human-computer interaction. Forty-two native speakers of German engaged in dynamic spoken interaction with a simulated virtual tutor for learning the German language called Mirabella. Mirabella was controlled by the experimenter and used either natural or hidden Markov model-based synthetic speech to communicate with the participants. In the course of four tasks, the participants’ accommodating behavior with respect to wh-question realization and allophonic variation in German was tested. The participants converged to Mirabella with respect to modified wh-question intonation, i.e., rising F0 contour and nuclear pitch accent on the interrogative pronoun, and the allophonic contrast [ÉȘç] vs. [ÉȘk] occurring in the word ending -ig. They did not accommodate to the allophonic contrast [ɛː] vs. [eː] as a realization of the long vowel -Ă€-. The results did not differ between the experimental groups that communicated with either the natural or the synthetic speech version of Mirabella. Testing the influence of the “Big Five” personality traits on the accommodating behavior revealed a tendency for neuroticism to influence the convergence of question intonation. On the level of individual speakers, we found considerable variation with respect to the degree and direction of accommodation. We conclude that phonetic accommodation on the level of local prosody and segmental pronunciation occurs in users of spoken dialog systems, which could be exploited in the context of computer-assisted language learning

    Évaluation expĂ©rimentale d'un systĂšme statistique de synthĂšse de la parole, HTS, pour la langue française

    Les travaux présentés dans cette thÚse se situent dans le cadre de la synthÚse de la parole à partir du texte et, plus précisément, dans le cadre de la synthÚse paramétrique utilisant des rÚgles statistiques. Nous nous intéressons à l'influence des descripteurs linguistiques utilisés pour caractériser un signal de parole sur la modélisation effectuée dans le systÚme de synthÚse statistique HTS. Pour cela, deux méthodologies d'évaluation objective sont présentées. La premiÚre repose sur une modélisation de l'espace acoustique, généré par HTS par des mélanges gaussiens (GMM). En utilisant ensuite un ensemble de signaux de parole de référence, il est possible de comparer les GMM entre eux et ainsi les espaces acoustiques générés par les différentes configurations de HTS. La seconde méthodologie proposée repose sur le calcul de distances entre trames acoustiques appariées pour pouvoir évaluer la modélisation effectuée par HTS de maniÚre plus locale. Cette seconde méthodologie permet de compléter les diverses analyses en contrÎlant notamment les ensembles de données générées et évaluées. Les résultats obtenus selon ces deux méthodologies, et confirmés par des évaluations subjectives, indiquent que l'utilisation d'un ensemble complexe de descripteurs linguistiques n'aboutit pas nécessairement à une meilleure modélisation et peut s'avérer contre-productif sur la qualité du signal de synthÚse produit.The work presented in this thesis is about TTS speech synthesis and, more particularly, about statistical speech synthesis for French. We present an analysis on the impact of the linguistic contextual factors on the synthesis achieved by the HTS statistical speech synthesis system. To conduct the experiments, two objective evaluation protocols are proposed. The first one uses Gaussian mixture models (GMM) to represent the acoustical space produced by HTS according to a contextual feature set. By using a constant reference set of natural speech stimuli, GMM can be compared between themselves and consequently acoustic spaces generated by HTS. The second objective evaluation that we propose is based on pairwise distances between natural speech and synthetic speech generated by HTS. Results obtained by both protocols, and confirmed by subjective evaluations, show that using a large set of contextual factors does not necessarily improve the modeling and could be counter-productive on the speech quality.RENNES1-Bibl. électronique (352382106) / SudocSudocFranceF

    RePaLi participation to CLEF eHealth IR challenge 2014: leveraging term variation

    International audienceThis paper describes the participation of RePaLi, a team composed with members of IRISA, LIMSI and STL, to the biomedical information retrieval challenge proposed in the framework of CLEF eHealth. For this first participation, our approach relies on a state-of-the-art IR system called Indri, based on statistical language modeling, and on semantic resources. The purpose of semantic resources and methods is to manage the term variation such as synonyms, morpho-syntactic variants, abbreviation or nested terms. Different combinations of resources and Indri settings are explored, mostly based on query expansion. For the runs submitted, our system shows up to 67.40 p@10 and up to 67.93 NDCG@10
