46 research outputs found
Improving Polish to English Neural Machine Translation with Transfer Learning: Effects of Data Volume and Language Similarity
This paper investigates the impact of data volume and the use of similar
languages on transfer learning in a machine translation task. We find out that
having more data generally leads to better performance, as it allows the model
to learn more patterns and generalizations from the data. However, related
languages can also be particularly effective when there is limited data
available for a specific language pair, as the model can leverage the
similarities between the languages to improve performance. To demonstrate, we
fine-tune mBART model for a Polish-English translation task using the OPUS-100
dataset. We evaluate the performance of the model under various transfer
learning configurations, including different transfer source languages and
different shot levels for Polish, and report the results. Our experiments show
that a combination of related languages and larger amounts of data outperforms
the model trained on related languages or larger amounts of data alone.
Additionally, we show the importance of related languages in zero-shot and
few-shot configurations
Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure
It has been established that incorporating word cluster features derived from large unlabeled corpora can significantly improve prediction of linguistic structure. While previous work has focused primarily on English, we extend these results to other languages along two dimensions. First, we show that these results hold true for a number of languages across families. Second, and more interestingly, we provide an algorithm for inducing cross-lingual clusters and we show that features derived from these clusters significantly improve the accuracy of cross-lingual structure prediction. Specifically, we show that by augmenting direct-transfer systems with cross-lingual cluster features, the relative error of delexicalized dependency parsers, trained on English treebanks and transferred to foreign languages, can be reduced by up to 13%. When applying the same method to direct transfer of named-entity recognizers, we observe relative improvements of up to 26%