    The SP2 SCOPES Project on Speech Prosody

    This is an overview of a Joint Research Project within the Scientific co-operation between Eastern Europe and Switzerland (SCOPES) Program of the Swiss National Science Foundation (SNFS) and Swiss Agency for Development and Cooperation (SDC). Within the SP2 SCOPES Project on Speech Prosody, in the course of the following two years, the four partners aim to collaborate on the subject of speech prosody and advance the extraction, processing, modeling and transfer of prosody for a large portfolio of European languages: French, German, Italian, English, Hungarian, Serbian, Croatian, Bosnian, Montenegrin, and Macedonian. Through the intertwined four research plans, synergies are foreseen to emerge that will build a foundation for submitting strong joint proposals for EU funding

    Meta Learning Approach to Phone Duration Modeling

    One of the essential prerequisites for achieving the naturalness of synthesized speech is the possibility of the automatic prediction of phone duration, due to the high importance of segmental duration in speech perception. In this paper we present a new phone duration prediction model for the Serbian language using meta learning approach. Based on the data obtained from the analysis of a large speech database, we used a feature set of 21 parameters describing phones and their contexts. These include attributes related to the segmental identity, manner of articulation (for consonants), attributes related to phonological context, such as segment types and voicing values of neighboring phones, presence or absence of lexical stress, morphological attributes, such as part-of-speech, and prosodic attributes, such as phonological word length, the position of the segment in the syllable, the position of the syllable in a word, the position of a word in a phrase, phrase break level, etc. Phone duration model obtained using meta learning algorithm outperformed the best individual model by approximately 2,0% and 1,7% in terms of the relative reduction of the root-mean-squared error and the mean absolute error, respectively

    Macedonian Speech Synthesis for Assistive Technology Applications

    Speech technology is becoming ever more ubiquitous with the advance of speech enabled devices and services. The use of speech synthesis in Augmentative and Alternative Communication tools, has facilitated inclusion of individuals with speech impediments allowing them to communicate with their surroundings using speech. Although there are numerous speech synthesis systems for the most spoken world languages, there is still a limited offer for smaller languages. We propose and compare three models built using parametric and deep learning techniques for Macedonian trained on a newly recorded corpus. We target low-resource edge deployment for Augmentative and Alternative Communication and assistive technologies, such as communication boards and screen readers. The listening test results show that parametric speech synthesis is as performant compared to the more advanced deep learning models. Since it also requires less resources, and offers full speech rate and pitch control, it is the preferred choice for building a Macedonian TTS system for this application scenario.Comment: 5 pages, 1 figure, EUSIPCO conference 202


    This paper considers the research question of developing user-aware and adaptive conversational agents. The conversational agent is a system which is user-aware to the extent that it recognizes the user identity and his/her emotional states that are relevant in a given interaction domain. The conversational agent is user-adaptive to the extent that it dynamically adapts its dialogue behavior according to the user and his/her emotional state. The paper summarizes some aspects of our previous work and presents work-in-progress in the field of speech-based human-machine interaction. It focuses particularly on the development of speech recognition modules in cooperation with both modules for emotion recognition and speaker recognition, as well as the dialogue management module. Finally, it proposes an architecture of a conversational agent that integrates those modules and improves each of them based on some kind of synergies among themselves

    Context-Dependent Speech Recognition in Human-Machine Interaction

    Поред великог значаја контекстуалних информација при разумевању говора, њихова обрада и употреба у савременим системима за аутоматско препознавање говора је веома ограничена, што знатно нарушава перформансе препознавања у реалним условима употребе. Стога, уколико желимо да се карактеристике ових система приближе људским, неопходно је укључити контекст у адекватном обиму. У овој тези је представљен нови методолошки приступ контекстно зависном препознавању говора у интеракцији између човека и машине. На методолошком нивоу, овај приступ је хибридан, јер интегрише статистичке и симболичке методе, и когнитивно инспирисан, јер узима у обзир увиде у резулатате ис траживања из области неурокогнитивних наука. Основни принцип је да се оцењивање хипотеза система за препознавање врши на основу њихове контекстуалне усклађености, информационог садржаја и семантичке исправности. Приступ је илустрован прототипским имплементацијама за конкретне домене интеракције.Pored velikog značaja kontekstualnih informacija pri razumevanju govora, njihova obrada i upotreba u savremenim sistemima za automatsko prepoznavanje govora je veoma ograničena, što znatno narušava performanse prepoznavanja u realnim uslovima upotrebe. Stoga, ukoliko želimo da se karakteristike ovih sistema približe ljudskim, neophodno je uključiti kontekst u adekvatnom obimu. U ovoj tezi je predstavljen novi metodološki pristup kontekstno zavisnom prepoznavanju govora u interakciji između čoveka i mašine. Na metodološkom nivou, ovaj pristup je hibridan, jer integriše statističke i simboličke metode, i kognitivno inspirisan, jer uzima u obzir uvide u rezulatate is traživanja iz oblasti neurokognitivnih nauka. Osnovni princip je da se ocenjivanje hipoteza sistema za prepoznavanje vrši na osnovu njihove kontekstualne usklađenosti, informacionog sadržaja i semantičke ispravnosti. Pristup je ilustrovan prototipskim implementacijama za konkretne domene interakcije.Although the importance of contextual information in speech recognition has been acknowledged for a long time now, it remained clearly underutilized even in state-of-the-art speech recognition systems. This thesis introduces a novel, methodologically hybrid approach to the research question of contextdependent speech recognition in human-machine interaction. To the extent that it is hybrid, the approach integrates aspects of both statistical and representational paradigms. The aim of this thesis is to extend the standard statistical pattern matching approach with a cognitively-inspired and analytically tractable model with explanatory power. This methodological extension allows for accounting for contextual information which is otherwise unavailable in speech recognition systems, and using it to improve postprocessing of recognition hypotheses. The thesis introduces an algorithm for evaluation of recognition hypotheses, illustrates it for concrete interaction domains, and discusses its implementation within two prototype conversational agents

    Intra- i interlingvalno prevođenje kroz prizmu lingvističke fluidnosti i cirkulisanja književnosti

    This dissertation concentrates on Roman Jakobson’s widespread classification of translational relations, which distinguishes intra-, interlingual, and intersemiotic translation. Albeit part of a tripartition, it is the distinction between intra- and interlingual translation that is central to this investigation. Inspired by the case of Serbo- Croatian’s administrative substitution with a greater number of individual languages – this dissertation argues that intra- and interlingual translation are not stable relations, further asserting that they are parasitic primarily on the definition and delimitation of language. Jakobson’s notions of intra- and interlingual translation are investigated through a twofold prism – of linguistic fluidity and literary circulation. On the one hand, linguistic fluidity serves as a basis for the exploration of the causes in the concepts’ instability. The term collectively denotes a series of manifestations where linguistic borders are challenged – either on a macro level, when the whole language undergoes a change in its unity and identity, or a micro level, when the boundaries are shifted in a multilingual text. On the other hand, literary circulation is selected as a means of measuring the effects of these inconsistencies, particularly in cultural terms...Disertacija obrađuje naširoko prihvaćenu klasifikaciju prevodilačkih relacija, predloženu od strane lingviste Romana Jakobsona, koja razlikuje intra-, interlingvalno i intersemiotičko prevođenje. Mada deo tročlane podele, glavni predmet istraživanja predstavlja odnos između intra- i interlingvalnog prevođenja. Inspirisana slučajem administrativnog cepanja srpskohrvatskog jezika na veći broj nezavisnih jezika, glavni argument ove disertacije jeste da intra- i interlingvalno prevođenje nisu stabilne relacije, već da one pre svega zavise od načina na koji se definiše i ograničava jezik. Jakobsonovi pojmovi intra- i interlingvalnog prevođenja istraživani su kroz dvostruku prizmu – lingvističke fluidnosti i cirkulisanja književnosti. S jedne strane, lingvistička fluidnost služi kao osnov za razmatranje uzroka nestabilnosti ovih koncepata. Termin se odnosi na niz manifestacija gde su jezičke granice dovedene u pitanje – bilo na makro nivou, kada se menja jedinstvo i identitet čitavog jezika, ili na mikro nivou, kada pisac namerno pomera granice unutar višejezičnog teksta. S druge strane, cirkulisanje književnosti trebalo bi da pomogne u procenjivanju posledica ovih pomeranja, posebno na polju kulture..