Search CORE

14 research outputs found

Nova metoda adaptacije na govornika u parametarskoj sintezi govora

Author: Pekar Darko
Publication venue: Универзитет у Новом Саду, Факултет техничких наука
Publication date: 03/09/2021
Field of study

The thesis describes and compares several methods of adaptation to the speaker using deep neural networks. Simple method of system adaptation, method proposing separate layers for different speakers, as well as adaptation in two phases. The last method starts from multispeaker model and a trained speaker space. Adaptation to a new speaker takes place in two phases: 1) searching for the optimal point in the speaker embedding space; 2) adapting the parameters of the rest of the network. It has been shown that the last approach yields the best results, by comparing objective measures, as well as by listening tests.У дисертацији је описано и упоређено неколико метода адаптације на говорника помоћу дубоких неуронских мрежа. Метода дообуке система, метода дељених и засебних слојева за различите говорнике, као и адаптација у две фазе. Последња метода као полазну тачку има систем обучен на више говорника и обучени простор говорника. Адаптација на новог говорника се одвија у две фазе: тражење оптималне тачке у простору говорника и адаптација параметара остатка мреже. Показано је да се најбољи резултати добијају коришћењем последње методе, путем поређења објективних мера, као и преко тестова слушања.U disertaciji je opisano i upoređeno nekoliko metoda adaptacije na govornika pomoću dubokih neuronskih mreža. Metoda doobuke sistema, metoda deljenih i zasebnih slojeva za različite govornike, kao i adaptacija u dve faze. Poslednja metoda kao polaznu tačku ima sistem obučen na više govornika i obučeni prostor govornika. Adaptacija na novog govornika se odvija u dve faze: traženje optimalne tačke u prostoru govornika i adaptacija parametara ostatka mreže. Pokazano je da se najbolji rezultati dobijaju korišćenjem poslednje metode, putem poređenja objektivnih mera, kao i preko testova slušanja

National Repository of Dissertations in Serbia (NaRDuS)

Nardus

Speech Technologies for Serbian and Kindred South Slavic Languages

Author: Darko Pekar
Marko Janev
Milan Secujski
Niksa Jakovljevic
Radovan Obradovic
Vlado Delic
Publication venue: 'IntechOpen'
Publication date: 16/08/2010
Field of study

IntechOpen

Applications of Speech Technologies in Western Balkan Countries

Author: Darko Pekar
Dragan Knezevic
Dragisa Miskovic
Milan Secujski
Natasa Vujnovic Sedlar
Vlado Delic
Publication venue: 'IntechOpen'
Publication date: 16/08/2010
Field of study

IntechOpen

AUTOMATIC PROSODY GENERATION IN A TEXT-TO-SPEECH SYSTEM FOR HEBREW

Author: Knežević Dragan
Pekar Darko
Popović Branislav
Sečujski Milan
Publication venue: Published by the University of Niš, Serbia
Publication date: 13/06/2014
Field of study

The paper presents the module for automatic prosody generation within a system for automatic synthesis of high-quality speech based on arbitrary text in Hebrew. The high quality of synthesis is due to the high accuracy of automatic prosody generation, enabling the introduction of elements of natural sentence prosody of Hebrew. Automatic morphological annotation of text is based on the application of an expert algorithm relying on transformational rules. Syntactic-prosodic parsing is also rule based, while the generation of the acoustic representation of prosodic features is based on classification and regression trees. A tree structure generated during the training phase enables accurate prediction of the acoustic representatives of prosody, namely, durations of phonetic segments as well as temporal evolution of fundamental frequency and energy. Such an approach to automatic prosody generation has lead to an improvement in the quality of synthesized speech, as confirmed by listening tests

University of Niš: Facta Universitatis (E-Journals) / Универзитет у Нишу

Cross-Lingual Neural Network Speech Synthesis Based on Multiple Embeddings

Author: Delić Vlado D.
Nosek Tijana V.
Obradović Radovan J.
Pekar Darko J.
Sečujski Milan S.
Suzić Siniša B.
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 11/05/2022
Field of study

The paper presents a novel architecture and method for speech synthesis in multiple languages, in voices of multiple speakers and in multiple speaking styles, even in cases when speech from a particular speaker in the target language was not present in the training data. The method is based on the application of neural network embedding to combinations of speaker and style IDs, but also to phones in particular phonetic contexts, without any prior linguistic knowledge on their phonetic properties. This enables the network not only to efficiently capture similarities and differences between speakers and speaking styles, but to establish appropriate relationships between phones belonging to different languages, and ultimately to produce synthetic speech in the voice of a certain speaker in a language that he/she has never spoken. The validity of the proposed approach has been confirmed through experiments with models trained on speech corpora of American English and Mexican Spanish. It has also been shown that the proposed approach supports the use of neural vocoders, i.e. that they are able to produce synthesized speech of good quality even in languages that they were not trained on

Re-UNIR

Using Morphological Data in Language Modeling for Serbian Large Vocabulary Speech Recognition

Author: Branislav Popović
Darko Pekar
Edvin Pakoci
Publication venue: 'Hindawi Limited'
Publication date
Field of study

Crossref

Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding

Author: Nosek Tijana
Pekar Darko
Sečujski Milan
Smirnov Anton
Suzić Siniša
Publication venue: Journal of Universal Computer Science
Publication date: 01/01/2020
Field of study

The paper presents a novel architecture and method for training neural networks to produce synthesized speech in a particular voice and speaking style, based on a small quantity of target speaker/style training data. The method is based on neural network embedding, i.e. mapping of discrete variables into continuous vectors in a low-dimensional space, which has been shown to be a very successful universal deep learning technique. In this particular case, different speaker/style combinations are mapped into different points in a low-dimensional space, which enables the network to capture the similarities and differences between speakers and speaking styles more efficiently. The initial model from which speaker/style adaptation was carried out was a multi-speaker/multi-style model based on 8.5 hours of American English speech data which corresponds to 16 different speaker/style combinations. The results of the experiments show that both versions of the obtained system, one using 10 minutes and the other as little as 30 seconds of target data, outperform the state of the art in parametric speaker/style-dependent speech synthesis. This opens a wide range of application of speaker/style dependent speech synthesis based on small quantities of training data, in domains ranging from customer interaction in call centers to robot-assisted medical therapy

ZENODO

Directory of Open Access Journals

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ARPHA OAI-PMH Endpoint

ARPHA Preprints