Search CORE

10 research outputs found

Speech Recognition System of Slovenian Broadcast News

Author: Sepesy Maučec Mirjam
Žgank Andrej
Publication venue: 'IntechOpen'
Publication date: 13/06/2011
Field of study

IntechOpen

Digital library of University of Maribor

Modeling of Filled Pauses and Onomatopoeas for Spontaneous Speech Recognition

Author: Andrej Zgank
Mirjam Sepesy Maucec
Publication venue: 'IntechOpen'
Publication date: 16/08/2010
Field of study

IntechOpen

Avtomatsko razpoznavanja slovenskega govora za dnevnoinformativne oddaje

Author: Andrej Žgank
Gregor Donaj
Lucija Gril
Mirjam Sepesy Maučec
Publication venue: 'University of Ljubljana'
Publication date: 01/07/2021
Field of study

Na področju govornih in jezikovnih tehnologij predstavlja avtomatsko razpoznavanje govora enega izmed ključnih gradnikov. V prispevku bomo predstavili razvoj avtomatskega razpoznavalnika slovenskega govora za domeno dnevnoinformativnih oddaj. Arhitektura sistema je zasnovana na globokih nevronskih mrežah. Pri tem smo ob upoštevanju razpoložljivih govornih virov izvedli modeliranje z različnimi aktivacijskimi funkcijami. V postopku razvoja razpoznavalnika govora smo preverili tudi, kakšen je vpliv izgubnih govornih kodekov na rezultate razpoznavanja govora. Za učenje razpoznavalnika govora smo uporabili bazi UMB BNSI Broadcast News in IETK-TV. Skupni obseg govornih posnetkov je znašal 66 ur. Vzporedno z globokimi nevronskimi mrežami smo povečali slovar razpoznavanja govora, ki je tako znašal 250.000 besed. Na ta način smo znižali delež besed izven slovarja na 1,33 %. Z razpoznavanjem govora na testni množici smo dosegli najboljšo stopnjo napačno razpoznanih besed (WER) 15,17 %. Med procesom vrednotenja rezultatov smo izvedli tudi podrobnejšo analizo napak razpoznavanja govora na osnovi lem in F-razredov, ki v določeni meri pokažejo na zahtevnost slovenskega jezika za takšne scenarije uporabe tehnologije

Directory of Open Access Journals

The Slovene BNSI Broadcast News database and reference speech corpus GOS: Towards the uniform guidelines for future work

Author: Ana Zwitter Vitez
Andrej Žgank
Darinka Verdonik
Publication venue
Publication date: 11/04/2020
Field of study

Abstract The aim of the paper is to search for common guidelines for the future development of speech databases for less resourced languages in order to make them the most useful for both main fields of their use, linguistic research and speech technologies. We compare two standards for creating speech databases, one followed when developing the Slovene speech database for automatic speech recognition -BNSI Broadcast News, the other followed when developing the Slovene reference speech corpus GOS, and outline possible common guidelines for future work. We also present an add-on for the GOS corpus, which enables its usage for automatic speech recognition

CiteSeerX

Context-dependent factored language models

Author: D Klakow
EM de Novais
Gregor Donaj
H Adel
K Kirchhoff
S Katz
SF Chen
T Hirsimaki
T Rotovnik
Zdravko Kačič
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Computational Intelligence and Human- Computer Interaction: Modern Methods and Applications

Author
Publication venue: 'MDPI AG'
Publication date: 21/06/2022
Field of study

The present book contains all of the articles that were accepted and published in the Special Issue of MDPI’s journal Mathematics titled "Computational Intelligence and Human–Computer Interaction: Modern Methods and Applications". This Special Issue covered a wide range of topics connected to the theory and application of different computational intelligence techniques to the domain of human–computer interaction, such as automatic speech recognition, speech processing and analysis, virtual reality, emotion-aware applications, digital storytelling, natural language processing, smart cars and devices, and online learning. We hope that this book will be interesting and useful for those working in various areas of artificial intelligence, human–computer interaction, and software engineering as well as for those who are interested in how these domains are connected in real-life situations

Directory of Open Access Books (DOAB)

Discourse markers in Slovenian and their applicability for developing speech-to-speech translation technologies

Author: Verdonik Darinka
Publication venue: EUT Edizioni Università di Trieste
Publication date: 01/01/2010
Field of study

OpenstarTs

Key word analysis of discourses in Slovene speech : differences and similarities

Author: Darinka Verdonik
Iztok Kosem
Publication venue: 'University of Ljubljana'
Publication date: 01/12/2012
Field of study

One of the aspects of speech that remains under-researched is the internal variety of speech, i.e. the differences and similarities between different types of speech. This paper aims to contribute to this research by making the comparison between different discourses of Slovene spontaneous speech, focusing on the use of vocabulary. The key word analysis (Scott, 1997), conducted on a million‑word corpus of spoken Slovene, was used to identify lexical items and groups of lexical items typical of a particular spoken discourse, or common to different types of spoken discourse. The results indicate that the presence or absence of a particular word class in the key word list can be a good indicator of a type of spoken discourse, or discourses.

Directory of Open Access Journals

Journals of Faculty of Arts, University of Ljubljana

Savremeni jezički korpusi na zapadnom Balkanu – istorijat, trenutno stanje i budučnost

Author: Nikola Dobrić
Publication venue: Slavistično društvo Slovenije
Publication date: 01/04/2012
Field of study

Directory of Open Access Journals

Speaker Diarization

Author: Kunešová Marie
Publication venue: Západočeská univerzita v Plzni
Publication date: 25/06/2021
Field of study

Disertační práce se zaměřuje na téma diarizace řečníků, což je úloha zpracování řeči typicky charakterizovaná otázkou "Kdo kdy mluví?". Práce se také zabývá související úlohou detekce překrývající se řeči, která je velmi relevantní pro diarizaci. Teoretická část práce poskytuje přehled existujících metod diarizace řečníků, a to jak těch offline, tak online, a přibližuje několik problematických oblastí, které byly identifikovány v rané fázi autorčina výzkumu. V práci je také předloženo rozsáhlé srovnání existujících systémů se zaměřením na jejich uváděné výsledky. Jedna kapitola se také zaměřuje na téma překrývající se řeči a na metody její detekce. Experimentální část práce předkládá praktické výstupy, kterých bylo dosaženo. Experimenty s diarizací se zaměřovaly zejména na online systém založený na GMM a na i-vektorový systém, který měl offline i online varianty. Závěrečná sekce experimentů také přibližuje nově navrženou metodu pro detekci překrývající se řeči, která je založena na konvoluční neuronové síti.ObhájenoThe thesis focuses on the topic of speaker diarization, a speech processing task that is commonly characterized as the question "Who speaks when?". It also addresses the related task of overlapping speech detection, which is very relevant for diarization. The theoretical part of the thesis provides an overview of existing diarization approaches, both offline and online, and discusses some of the problematic areas which were identified in early stages of the author's research. The thesis also includes an extensive comparison of existing diarization systems, with focus on their reported performance. One chapter is also dedicated to the topic of overlapping speech and the methods of its detection. The experimental part of the thesis then presents the work which has been done on speaker diarization, which was focused mostly on a GMM-based online diarization system and an i-vector based system with both offline and online variants. The final section also details a newly proposed approach for detecting overlapping speech using a convolutional neural network

DSpace at University of West Bohemia