50 research outputs found
Machine Translation of Low-Resource Spoken Dialects: Strategies for Normalizing Swiss German
The goal of this work is to design a machine translation (MT) system for a
low-resource family of dialects, collectively known as Swiss German, which are
widely spoken in Switzerland but seldom written. We collected a significant
number of parallel written resources to start with, up to a total of about 60k
words. Moreover, we identified several other promising data sources for Swiss
German. Then, we designed and compared three strategies for normalizing Swiss
German input in order to address the regional diversity. We found that
character-based neural MT was the best solution for text normalization. In
combination with phrase-based statistical MT, our solution reached 36% BLEU
score when translating from the Bernese dialect. This value, however, decreases
as the testing data becomes more remote from the training one, geographically
and topically. These resources and normalization techniques are a first step
towards full MT of Swiss German dialects.Comment: 11th Language Resources and Evaluation Conference (LREC), 7-12 May
2018, Miyazaki (Japan
Multi-Dimensional Explanation of Target Variables from Documents
Automated predictions require explanations to be interpretable by humans.
Past work used attention and rationale mechanisms to find words that predict
the target variable of a document. Often though, they result in a tradeoff
between noisy explanations or a drop in accuracy. Furthermore, rationale
methods cannot capture the multi-faceted nature of justifications for multiple
targets, because of the non-probabilistic nature of the mask. In this paper, we
propose the Multi-Target Masker (MTM) to address these shortcomings. The
novelty lies in the soft multi-dimensional mask that models a relevance
probability distribution over the set of target variables to handle
ambiguities. Additionally, two regularizers guide MTM to induce long,
meaningful explanations. We evaluate MTM on two datasets and show, using
standard metrics and human annotations, that the resulting masks are more
accurate and coherent than those generated by the state-of-the-art methods.
Moreover, MTM is the first to also achieve the highest F1 scores for all the
target variables simultaneously.Comment: Accepted in AAAI 2021. 18 pages, 14 figures, 9 table