Search CORE

34 research outputs found

Mapping Topic Evolution Across Poetic Traditions

Author: Haider Thomas N.
Plechac Petr
Publication venue
Publication date: 25/07/2020
Field of study

Poetic traditions across languages evolved differently, but we find that certain semantic topics occur in several of them, albeit sometimes with temporal delay, or with diverging trajectories over time. We apply Latent Dirichlet Allocation (LDA) to poetry corpora of four languages, i.e. German (52k poems), English (85k poems), Russian (18k poems), and Czech (80k poems). We align and interpret salient topics, their trend over time (1600--1925 A.D.), showing similarities and disparities across poetic traditions with a few select topics, and use their trajectories over time to pinpoint specific literary epochs

arXiv.org e-Print Archive

MPG.PuRe

Yet Another Format of Universal Dependencies for Korean

Author: Chen Yige
Jo Eunkyul Leah
Lim KyungTae
Park Jungyeul
Silfverberg Miikka
Tyers Francis M.
Yao Yundong
Publication venue
Publication date: 20/09/2022
Field of study

In this study, we propose a morpheme-based scheme for Korean dependency parsing and adopt the proposed scheme to Universal Dependencies. We present the linguistic rationale that illustrates the motivation and the necessity of adopting the morpheme-based format, and develop scripts that convert between the original format used by Universal Dependencies and the proposed morpheme-based format automatically. The effectiveness of the proposed format for Korean dependency parsing is then testified by both statistical and neural models, including UDPipe and Stanza, with our carefully constructed morpheme-based word embedding for Korean. morphUD outperforms parsing results for all Korean UD treebanks, and we also present detailed error analyses.Comment: COLING2022, Poste

arXiv.org e-Print Archive

Koditex — korpus diverzifikovaných textů

Author: Komrsková Zuzana
Zasina Adrian Jan
Publication venue: Univerzita Karlova, Filozofická fakulta
Publication date: 01/01/2019
Field of study

12713

CU Digital Repository

Metre and Semantics in the Poetry of Czech Post-Symbolists Accessed via LDA Topic Modelling

Author: Kolár Robert
Plecháč Petr
Publication venue: University of Tartu Press
Publication date: 01/09/2022
Field of study

The article deals with the relationship between semantics and poetic meter in the works of Czech post-symbolist poets and their predecessors. We access the phenomena by means of a machine-driven meter recognition on one hand and LDA topic modelling on the other. We first show how the poetic groups differ in their general preferences for particular topics. Next we analyze the topic distributions in two dominant metres (i.e. iamb and trochee) across the poetic groups

Journals from University of Tartu

SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task

Author: Agirre Eneko
Branco António
Gaudio Rosa
Gomes Luís
Labaka Gorka
Neale Steven
Oele Dieke
Osenova Petya
Popel Martin
Querido Andreia
Rendeiro Nuno
Rodrigues João
Silva João
Simov Kiril
van Noord Gertjan
Publication venue
Publication date: 01/01/2016
Field of study

This paper presents the description of 12 systems submitted to the WMT16 IT-task, covering six different languages, namely Basque, Bulgarian, Dutch, Czech, Portuguese and Spanish. All these systems were developed under the scope of the QTLeap project, presenting a common strategy. For each language two different systems were submitted, namely a phrase-based MT system built using Moses, and a system exploiting deep language engineering approaches, that in all the languages but Bulgarian was implemented using TectoMT. For 4 of the 6 languages, the TectoMT-based system performs better than the Moses-based one

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Biblio at Institute of Formal and Applied Linguistics

Dissertations of the University of Groningen

ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation

Author: Hu J. Edward
Post Matt
Rudinger Rachel
Van Durme Benjamin
Publication venue
Publication date: 11/01/2019
Field of study

We present ParaBank, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality. Following the approach of ParaNMT, we train a Czech-English neural machine translation (NMT) system to generate novel paraphrases of English reference sentences. By adding lexical constraints to the NMT decoding procedure, however, we are able to produce multiple high-quality sentential paraphrases per source sentence, yielding an English paraphrase resource with more than 4 billion generated tokens and exhibiting greater lexical diversity. Using human judgments, we also demonstrate that ParaBank's paraphrases improve over ParaNMT on both semantic similarity and fluency. Finally, we use ParaBank to train a monolingual NMT model with the same support for lexically-constrained decoding for sentence rewriting tasks.Comment: To be presented at AAAI 2019. 8 page

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Latent Morphology Model for Open-Vocabulary Neural Machine Translation

Author: Ataman Duygu
Aziz Wilker
Birch Alexandra
Publication venue
Publication date: 26/02/2020
Field of study

Translation into morphologically-rich languages challenges neural machine translation (NMT) models with extremely sparse vocabularies where atomic treatment of surface forms is unrealistic. This problem is typically addressed by either pre-processing words into subword units or performing translation directly at the level of characters. The former is based on word segmentation algorithms optimized using corpus-level statistics with no regard to the translation task. The latter learns directly from translation data but requires rather deep architectures. In this paper, we propose to translate words by modeling word formation through a hierarchical latent variable model which mimics the process of morphological inflection. Our model generates words one character at a time by composing two latent representations: a continuous one, aimed at capturing the lexical semantics, and a set of (approximately) discrete features, aimed at capturing the morphosyntactic function, which are shared among different surface forms. Our model achieves better accuracy in translation into three morphologically-rich languages than conventional open-vocabulary NMT methods, while also demonstrating a better generalization capacity under low to mid-resource settings.Comment: Published at ICLR 202

arXiv.org e-Print Archive

Edinburgh Research Explorer

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

System for Interlinking Texts of State Exam Topics, Learning Support- and Other Supplementary Materials

Author: Hradílek Jakub
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2016
Field of study

Hlavním úkolem této práce je se seznámit s metodami vyhledávání definic odborných pojmů napříč texty. Následně navrhnout a vytvořit systém, který bude schopen propojit texty státnicových témat, studijních opor a doplňkových materiálů. Na závěr vyhodnotit vytvořený systém na materiálech z VUT FIT v Brně a zhodnotit výsledky vzhledem k použitelnosti výstupů pro přípravu studentů k závěrečným zkouškám.The main goal of this thesis is to survey methods which are used for keyword extraction from articles and text documents. After that design and create system, which will be able to interlink texts of state exam topics, learning support and other supplementary materials. Finally step is evaluate the created system to materials from VUT FIT in Brno and appraise results in applicability for preparing students for final exams.

Digital library of Brno University of Technology

National Repository of Grey Literature

The WMT'18 Morpheval test suites for English-Czech, English-German, English-Finnish and Turkish-English

Author: Bojar Ondrej
Burlot Franck
Grönroos Stig-Arne
Koponen Maarit
Nieminen Tommi
Ravishankar Vinit
Scherrer Yves
Yvon François
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2018
Field of study

Peer reviewe

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto