Search CORE

6 research outputs found

Data augmentation for machine translation via dependency subtree swapping

Author: Barta Botond
Lakatos Dorina Petra
Nagy Attila
Nanys Patrick
Ács Judit
Publication venue
Publication date: 01/01/2023
Field of study

We present a generic framework for data augmentation via dependency subtree swapping that is applicable to machine translation. We extract corresponding subtrees from the dependency parse trees of the source and target sentences and swap these across bisentences to create augmented samples. We perform thorough filtering based on graphbased similarities of the dependency trees and additional heuristics to ensure that extracted subtrees correspond to the same meaning. We conduct resource-constrained experiments on 4 language pairs in both directions using the IWSLT text translation datasets and the Hunglish2 corpus. The results demonstrate consistent improvements in BLEU score over our baseline models in 3 out of 4 language pairs. Our code is available on GitHub

University of Szeged

HunSum-1 : an abstractive summarization dataset for Hungarian

Author: Barta Botond
Lakatos Dorina
Nagy Attila
Nyist Milán Konor
Ács Judit
Publication venue
Publication date: 01/01/2023
Field of study

We introduce HunSum-1 : a dataset for Hungarian abstractive summarization, consisting of 1.14M news articles. The dataset is built by collecting, cleaning and deduplicating data from 9 major Hungarian news sites through CommonCrawl. Using this dataset, we build abstractive summarizer models based on huBERT and mT5. We demonstrate the value of the created dataset by performing a quantitative and qualitative analysis on the models’ results. The HunSum-1 dataset, all models used in our experiments and our code1 are available open source

University of Szeged

Data Augmentation for Machine Translation via Dependency Subtree Swapping

Author: Barta Botond
Lakatos Dorina Petra
Nagy Attila
Nanys Patrick
Ács Judit
Publication venue
Publication date: 13/07/2023
Field of study

arXiv.org e-Print Archive

Data Augmentation for Machine Translation via Dependency Subtree Swapping

Author: Barta Botond
Lakatos Dorina Petra
Nagy A
Nanys P
Ács Judit
Publication venue: 'SZTE Hungarian Scientific Society of the Silicate Industry'
Publication date: 01/01/2023
Field of study

SZTAKI Publication Repository

HunSum-1: an Abstractive Summarization Dataset for Hungarian

Author: Barta Botond
Lakatos Dorina Petra
Nagy A
Nyist M K
Ács Judit
Publication venue: 'SZTE Hungarian Scientific Society of the Silicate Industry'
Publication date: 01/01/2023
Field of study

SZTAKI Publication Repository