Search CORE

48 research outputs found

Speech Recognition System of Slovenian Broadcast News

Author: Sepesy Maučec Mirjam
Žgank Andrej
Publication venue: 'IntechOpen'
Publication date: 13/06/2011
Field of study

IntechOpen

Digital library of University of Maribor

Machine Translation and the Evaluation of Its Quality

Author: Donaj Gregor
Maučec Mirjam Sepesy
Publication venue: 'IntechOpen'
Publication date: 07/09/2019
Field of study

Machine translation has already become part of our everyday life. This chapter gives an overview of machine translation approaches. Statistical machine translation was a dominant approach over the past 20 years. It brought many cases of practical use. It is described in more detail in this chapter. Statistical machine translation is not equally successful for all language pairs. Highly inflectional languages are hard to process, especially as target languages. As statistical machine translation has almost reached the limits of its capacity, neural machine translation is becoming the technology of the future. This chapter also describes the evaluation of machine translation quality. It covers manual and automatic evaluations. Traditional and recently proposed metrics for automatic machine translation evaluation are described. Human translation still provides the best translation quality, but it is, in general, time-consuming and expensive. Integration of human and machine translation is a promising workflow for the future. Machine translation will not replace human translation, but it can serve as a tool to increase productivity in the translation process

IntechOpen

Crossref

Statistical Machine Translation from Slovenian to English

Author: Mirjam Sepesy Maučec
Zdravko Kačić
Publication venue: 'University of Zagreb - University Computing Centre'
Publication date: 01/01/2007
Field of study

In this paper, we analyse three statistical models for the machine translation of Slovenian into English. All of them are based on the IBM Model~4, but differ in the type of linguistic knowledge they use. Model 4a uses only basic linguistic units of the text, i.e., words and sentences. In Model 4b, lemmatisation is used as a preprocessing step of the translation task. Lemmatisation also makes it possible to add a Slovenian-English dictionary as an additional knowledge source. Model 4c takes advantage of the morpho-syntactic descriptions (MSD) of words. In Model 4c, MSD codes replace the automatic word classes used in Models 4a and 4b. The models are experimentally evaluated using the IJS-ELAN parallel corpus

Crossref

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

O avtomatski evalvaciji strojnega prevajanja

Author: Darinka Verdonik
Mirjam Sepesy Maučec
Publication venue: 'University of Ljubljana'
Publication date: 01/12/2013
Field of study

Stalen del razvoja strojnega prevajanja je evalvacija prevodov, pri čemer se v glavnem uporabljajo avtomatski postopki. Ti vedno temeljijo na referenčnem prevodu. V tem prispevku pokažemo, kako zelo različni so lahko referenčni prevodi za področje podnaslavljanja ter kako lahko to vpliva na oceno – ista metrika lahko isti prevajalnik oceni kot neuporaben ali kot zelo uspešen samo na podlagi tega, da uporabimo referenčne prevode, ki so pridobljeni po različnih postopkih, vendar vedno jezikovno in pomensko povsem ustrezni

Directory of Open Access Journals

Avtomatsko razpoznavanja slovenskega govora za dnevnoinformativne oddaje

Author: Andrej Žgank
Gregor Donaj
Lucija Gril
Mirjam Sepesy Maučec
Publication venue: 'University of Ljubljana'
Publication date: 01/07/2021
Field of study

Na področju govornih in jezikovnih tehnologij predstavlja avtomatsko razpoznavanje govora enega izmed ključnih gradnikov. V prispevku bomo predstavili razvoj avtomatskega razpoznavalnika slovenskega govora za domeno dnevnoinformativnih oddaj. Arhitektura sistema je zasnovana na globokih nevronskih mrežah. Pri tem smo ob upoštevanju razpoložljivih govornih virov izvedli modeliranje z različnimi aktivacijskimi funkcijami. V postopku razvoja razpoznavalnika govora smo preverili tudi, kakšen je vpliv izgubnih govornih kodekov na rezultate razpoznavanja govora. Za učenje razpoznavalnika govora smo uporabili bazi UMB BNSI Broadcast News in IETK-TV. Skupni obseg govornih posnetkov je znašal 66 ur. Vzporedno z globokimi nevronskimi mrežami smo povečali slovar razpoznavanja govora, ki je tako znašal 250.000 besed. Na ta način smo znižali delež besed izven slovarja na 1,33 %. Z razpoznavanjem govora na testni množici smo dosegli najboljšo stopnjo napačno razpoznanih besed (WER) 15,17 %. Med procesom vrednotenja rezultatov smo izvedli tudi podrobnejšo analizo napak razpoznavanja govora na osnovi lem in F-razredov, ki v določeni meri pokažejo na zahtevnost slovenskega jezika za takšne scenarije uporabe tehnologije

Directory of Open Access Journals

Large vocabulary continuous speech recognition of an inflected language using stems and endings

Author: Bellman
Beyerlein
Comrie
Deshmukh
Dimec
Kwon
Mirjam Sepesy Maučec
Mohri
Ohtsuki
Popovič
Sepesy
Sixtus
Tomaž Rotovnik
Zdravko Kačič
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Programiranje za telekomunikacije II : izbrana poglavja iz podatkovnih struktur

Author: Sepesy Maučec Mirjam
Publication venue: Fakulteta za elektrotehniko, računalništvo in informatiko
Publication date: 31/05/2012
Field of study

Digital library of University of Maribor

Discovering Daily Activity Patterns from Sensor Data Sequences and Activity Sequences

Author: Mirjam Sepesy Maučec
Publication venue: 'MDPI AG'
Publication date: 19/10/2021
Field of study

The necessity of caring for elderly people is increasing. Great efforts are being made to enable the elderly population to remain independent for as long as possible. Technologies are being developed to monitor the daily activities of a person to detect their state. Approaches that recognize activities from simple environment sensors have been shown to perform well. It is also important to know the habits of a resident to distinguish between common and uncommon behavior. In this paper, we propose a novel approach to discover a person’s common daily routines. The approach consists of sequence comparison and a clustering method to obtain partitions of daily routines. Such partitions are the basis to detect unusual sequences of activities in a person’s day. Two types of partitions are examined. The first partition type is based on daily activity vectors, and the second type is based on sensor data. We show that daily activity vectors are needed to obtain reasonable results. We also show that partitions obtained with generalized Hamming distance for sequence comparison are better than partitions obtained with the Levenshtein distance. Experiments are performed with two publicly available datasets

Multidisciplinary Digital Publishing Institute