Search CORE

9 research outputs found

Speech Recognition System of Slovenian Broadcast News

Author: Sepesy Maučec Mirjam
Žgank Andrej
Publication venue: 'IntechOpen'
Publication date: 13/06/2011
Field of study

IntechOpen

Digital library of University of Maribor

Trainee Translators’ Perceptions of the Role of Pronunciation and Speech Technologies in the Technology-Driven Translation Profession

Author: Nataša Hirci
Publication venue: 'University of Ljubljana'
Publication date: 01/06/2019
Field of study

We live in a world of rapid technological advances which constantly affect the work of professional translators. Suitable training is therefore required for future translators to be able to compete on the translation market. With the rise of translation technologies, new ideas have been put forward on how to make translators faster and more efficient. Among the technologies that future translators may not be adequately familiar with are speech recognition tools; these enable translators to dictate their sight translation and have it typed out, allowing more time to focus on the content. However, as with all digital tools, the quality of input is important; a question thus arises on the role pronunciation assumes in such work. The present study aimed to establish how much awareness there is amongst the trainee translators of the possibilities afforded by speech technologies and to explore their perceptions of the role played by pronunciation

Directory of Open Access Journals

Journals of Faculty of Arts, University of Ljubljana

Universal Lemmatizer: A sequence-to-sequence model for lemmatizing Universal Dependencies treebanks

Author: Ginter Filip
Kanerva Jenna
Salakoski Tapio
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 27/10/2022
Field of study

In this paper, we present a novel lemmatization method based on a sequence-to-sequence neural network architecture and morphosyntactic context representation. In the proposed method, our context-sensitive lemmatizer generates the lemma one character at a time based on the surface form characters and its morphosyntactic features obtained from a morphological tagger. We argue that a sliding window context representation suffers from sparseness, while in majority of cases the morphosyntactic features of a word bring enough information to resolve lemma ambiguities while keeping the context representation dense and more practical for machine learning systems. Additionally, we study two different data augmentation methods utilizing autoencoder training and morphological transducers especially beneficial for low-resource languages. We evaluate our lemmatizer on 52 different languages and 76 different treebanks, showing that our system outperforms all latest baseline systems. Compared to the best overall baseline, UDPipe Future, our system outperforms it on 62 out of 76 treebanks reducing errors on average by 19% relative. The lemmatizer together with all trained models is made available as a part of the Turku-neural-parsing-pipeline under the Apache 2.0 license.</p

UTUPub

CLARIN

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 30/01/2023
Field of study

The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium

Directory of Open Access Books (DOAB)

CLARIN. The infrastructure for language resources

Author: Fišer Darja
Witt Andreas
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 17/10/2022
Field of study

CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future. The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)

Publikationsserver des Instituts für Deutsche Sprache

CLARIN

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

OAPEN Library

The Palgrave Handbook of Digital Russia Studies

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This open access handbook presents a multidisciplinary and multifaceted perspective on how the ‘digital’ is simultaneously changing Russia and the research methods scholars use to study Russia. It provides a critical update on how Russian society, politics, economy, and culture are reconfigured in the context of ubiquitous connectivity and accounts for the political and societal responses to digitalization. In addition, it answers practical and methodological questions in handling Russian data and a wide array of digital methods. The volume makes a timely intervention in our understanding of the changing field of Russian Studies and is an essential guide for scholars, advanced undergraduate and graduate students studying Russia today

OAPEN Library

The Palgrave Handbook of Digital Russia Studies

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

OAPEN Library