Search CORE

513 research outputs found

Macedonian Speech Synthesis for Assistive Technology Applications

Author: Argirova Violeta
Bachvarovski Toni
Chavdarov Risto
Gerazov Branislav
Ivanovski Zoran
Janev Stefan
Lazarev Kristijan
Sofronievski Bojan
Tashkovski Dimitar
Velichkovski Martin
Veljkovikj Tea
Velovska Elena
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/06/2022
Field of study

Speech technology is becoming ever more ubiquitous with the advance of speech enabled devices and services. The use of speech synthesis in Augmentative and Alternative Communication tools, has facilitated inclusion of individuals with speech impediments allowing them to communicate with their surroundings using speech. Although there are numerous speech synthesis systems for the most spoken world languages, there is still a limited offer for smaller languages. We propose and compare three models built using parametric and deep learning techniques for Macedonian trained on a newly recorded corpus. We target low-resource edge deployment for Augmentative and Alternative Communication and assistive technologies, such as communication boards and screen readers. The listening test results show that parametric speech synthesis is as performant compared to the more advanced deep learning models. Since it also requires less resources, and offers full speech rate and pitch control, it is the preferred choice for building a Macedonian TTS system for this application scenario.Comment: 5 pages, 1 figure, EUSIPCO conference 202

arXiv.org e-Print Archive

Natural language processing for similar languages, varieties, and dialects: A survey

Author: Nakov Preslav
Scherrer Yves
Zampieri Marcos
Publication venue
Publication date: 20/11/2020
Field of study

There has been a lot of recent interest in the natural language processing (NLP) community in the computational processing of language varieties and dialects, with the aim to improve the performance of applications such as machine translation, speech recognition, and dialogue systems. Here, we attempt to survey this growing field of research, with focus on computational methods for processing similar languages, varieties, and dialects. In particular, we discuss the most important challenges when dealing with diatopic language variation, and we present some of the available datasets, the process of data collection, and the most common data collection strategies used to compile datasets for similar languages, varieties, and dialects. We further present a number of studies on computational methods developed and/or adapted for preprocessing, normalization, part-of-speech tagging, and parsing similar languages, language varieties, and dialects. Finally, we discuss relevant applications such as language and dialect identification and machine translation for closely related languages, language varieties, and dialects.Non peer reviewe

Helsingin yliopiston digitaalinen arkisto

Spraying Religion: (Anti-)Religious Graffiti of the Post-Socialist Transition

Author: Velikonja Mitja
Publication venue: Digital Commons @ George Fox University
Publication date: 01/05/2020
Field of study

This article discusses graffiti and street art concerning religion, part of the author\u27s much broader and continuous research on contemporary political graffiti and street art in post-socialist Central and Eastern Europe, from the Baltics to the Balkans, from Prague to Moscow, comprising over 20 years of systematic fieldwork

MaCoCu:Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages

Author: Bañón Marta
Esplà-Gomis Miquel
Forcada Mikel L.
García-Romero Cristian
Kuzman Taja
Ljubešić Nikola
Ramírez-Sánchez Gema
Rupnik Peter
Sempere Leopoldo Pla
Suchomel Vít
Toral Antonio
van der Werff Tobias
van Noord Rik
Zaragoza Jaume
Publication venue: European Association for Machine Translation
Publication date: 01/01/2022
Field of study

We introduce the project MaCoCu: Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages, funded by the Connecting Europe Facility, which is aimed at building monolingual and parallel corpora for under-resourced European languages. The approach followed consists of crawling large amounts of textual data from selected top-level domains of the Internet, and then applying a curation and enrichment pipeline. In addition to corpora, the project will release the free/open-source web crawling and curation software used.</p

MaCoCu:Massive collection and curation of monolingual and bilingual data: focus on under-resourced languages

Author: Bañón Marta
Esplà-Gomis Miquel
Forcada Mikel L.
García-Romero Cristian
Kuzman Taja
Ljubešić Nikola
Ramírez-Sánchez Gema
Rupnik Peter
Sempere Leopoldo Pla
Suchomel Vít
Toral Antonio
van der Werff Tobias
van Noord Rik
Zaragoza Jaume
Publication venue: European Association for Machine Translation
Publication date: 01/01/2022
Field of study