Easily accessible language technologies for Slovene, Croatian and Serbian

Darja Fi\u161er; Filip Klubi\u10dka; Filip Petkovski; Maja Mili\u10devi\u107; Nikola Ljube\u161i\u107; Tanja Samard\u17ei\u107; Toma\u17e Erjavec

Easily accessible language technologies for Slovene, Croatian and Serbian

Authors: Darja Fi\u161er
Filip Klubi\u10dka
Filip Petkovski
Maja Mili\u10devi\u107
Nikola Ljube\u161i\u107
Tanja Samard\u17ei\u107
Toma\u17e Erjavec
Publication date: 1 January 2016
Publisher: country:SVN

Abstract

In this paper we present the pipeline of recently developed language technology tools for Slovene, Croatian and Serbian. They currently cover text segmentation, text normalisation, part-of-speech tagging, lemmatisation and inflectional lexicon lookup. Most rely on machine learning approaches, such as statistical machine translation and conditional random fields, capable of producing high-quality models for the phenomenon covered. Special emphasis is put on easy accessibility of these tools by offering them and the trained models for all three languages as (1) open source via public git repositories and (2) online in the form of web applications and web services

Similar works

Full text

Available Versions

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

oai:cris.unibo.it:11585/775531

Last time updated on 30/11/2020