13 research outputs found
SMRT Chatbots: Improving Non-Task-Oriented Dialog with Simulated Multiple Reference Training
Non-task-oriented dialog models suffer from poor quality and non-diverse
responses. To overcome limited conversational data, we apply Simulated Multiple
Reference Training (SMRT; Khayrallah et al., 2020), and use a paraphraser to
simulate multiple responses per training prompt. We find SMRT improves over a
strong Transformer baseline as measured by human and automatic quality scores
and lexical diversity. We also find SMRT is comparable to pretraining in human
evaluation quality, and outperforms pretraining on automatic quality and
lexical diversity, without requiring related-domain dialog data.Comment: EMNLP 2020 Camera Read
Evaluating Paraphrastic Robustness in Textual Entailment Models
We present PaRTE, a collection of 1,126 pairs of Recognizing Textual
Entailment (RTE) examples to evaluate whether models are robust to
paraphrasing. We posit that if RTE models understand language, their
predictions should be consistent across inputs that share the same meaning. We
use the evaluation set to determine if RTE models' predictions change when
examples are paraphrased. In our experiments, contemporary models change their
predictions on 8-16\% of paraphrased examples, indicating that there is still
room for improvement
A Study in Improving BLEU Reference Coverage with Diverse Automatic Paraphrasing
We investigate a long-perceived shortcoming in the typical use of BLEU: its
reliance on a single reference. Using modern neural paraphrasing techniques, we
study whether automatically generating additional diverse references can
provide better coverage of the space of valid translations and thereby improve
its correlation with human judgments. Our experiments on the into-English
language directions of the WMT19 metrics task (at both the system and sentence
level) show that using paraphrased references does generally improve BLEU, and
when it does, the more diverse the better. However, we also show that better
results could be achieved if those paraphrases were to specifically target the
parts of the space most relevant to the MT outputs being evaluated. Moreover,
the gains remain slight even when human paraphrases are used, suggesting
inherent limitations to BLEU's capacity to correctly exploit multiple
references. Surprisingly, we also find that adequacy appears to be less
important, as shown by the high results of a strong sampling approach, which
even beats human paraphrases when used with sentence-level BLEU.Comment: Accepted in the Findings of EMNLP 202
Active Learning for Natural Language Generation
The field of text generation suffers from a severe shortage of labeled data
due to the extremely expensive and time consuming process involved in manual
annotation. A natural approach for coping with this problem is active learning
(AL), a well-known machine learning technique for improving annotation
efficiency by selectively choosing the most informative examples to label.
However, while AL has been well-researched in the context of text
classification, its application to text generation remained largely unexplored.
In this paper, we present a first systematic study of active learning for text
generation, considering a diverse set of tasks and multiple leading AL
strategies. Our results indicate that existing AL strategies, despite their
success in classification, are largely ineffective for the text generation
scenario, and fail to consistently surpass the baseline of random example
selection. We highlight some notable differences between the classification and
generation scenarios, and analyze the selection behaviors of existing AL
strategies. Our findings motivate exploring novel approaches for applying AL to
NLG tasks
Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models
This dissertation focuses on effective combination of data-driven natural language processing (NLP) approaches with linguistic knowledge sources that are based on manual text annotation or word grouping according to semantic commonalities. I gainfully apply fine-grained linguistic soft constraints -- of syntactic or semantic nature -- on statistical NLP models, evaluated in end-to-end state-of-the-art statistical machine translation (SMT) systems. The introduction of semantic soft constraints involves intrinsic evaluation on word-pair similarity ranking tasks, extension from words to phrases, application in a novel distributional paraphrase generation technique, and an introduction of a generalized framework of which these soft semantic and syntactic constraints can be viewed as instances, and in which they can be potentially combined.
Fine granularity is key in the successful combination of these soft constraints, in many cases. I show how to softly constrain SMT models by adding fine-grained weighted features, each preferring translation of only a specific syntactic constituent. Previous attempts using coarse-grained features yielded negative results. I also show how to softly constrain corpus-based semantic models of words (“distributional profiles”) to effectively create word-sense-aware models, by using semantic word grouping information found in a manually compiled thesaurus. Previous attempts, using hard constraints and resulting in aggregated, coarse-grained models, yielded lower gains.
A novel paraphrase generation technique incorporating these soft semantic constraints is then also evaluated in a SMT system. This paraphrasing technique is based on the Distributional Hypothesis. The main advantage of this novel technique over current “pivoting” techniques for paraphrasing is the independence from parallel texts, which are a limited resource. The evaluation is done by augmenting translation models with paraphrase-based translation rules, where fine-grained scoring of paraphrase-based rules yields significantly higher gains.
The model augmentation includes a novel semantic reinforcement component:
In many cases there are alternative paths of generating a paraphrase-based translation rule. Each of these paths reinforces a dedicated score for the “goodness” of the new translation rule. This augmented score is then used as a soft constraint, in a weighted log-linear feature, letting the translation model learn how much to “trust” the paraphrase-based translation rules.
The work reported here is the first to use distributional semantic similarity measures to improve performance of an end-to-end phrase-based SMT system. The unified framework for statistical NLP models with soft linguistic constraints enables, in principle, the combination of both semantic and syntactic constraints -- and potentially other constraints, too -- in a single SMT model
Sentence Similarity and Machine Translation
Neural machine translation (NMT) systems encode an input sentence into an intermediate representation and then decode that representation into the output sentence. Translation requires deep understanding of language; as a result, NMT models trained on large amounts of data develop a semantically rich intermediate representation.
We leverage this rich intermediate representation of NMT systems—in particular, multilingual NMT systems, which learn to map many languages into and out of a joint space—for bitext curation, paraphrasing, and automatic machine translation (MT) evaluation. At a high level, all of these tasks are rooted in similarity: sentence and document alignment requires measuring similarity of sentences and documents, respectively; paraphrasing requires producing output which is similar to an input; and automatic MT evaluation requires measuring the similarity between MT system outputs and corresponding human reference translations.
We use multilingual NMT for similarity in two ways: First, we use a multilingual NMT model with a fixed-size intermediate representation (Artetxe and Schwenk, 2018) to produce multilingual sentence embeddings, which we use in both sentence and document alignment. Second, we train a multilingual NMT model and show that it generalizes to the task of generative paraphrasing (i.e., “translating” from Russian to Russian), when used in conjunction with a simple generation algorithm to discourage copying from the input to the output. We also use this model for automatic MT evaluation, to force decode and score MT system outputs conditioned on their respective human reference translations. Since we leverage multilingual NMT models, each method works in many languages using a single model.
We show that simple methods, which leverage the intermediate representation of multilingual NMT models trained on large amounts of bitext, outperform prior work in paraphrasing, sentence alignment, document alignment, and automatic MT evaluation. This finding is consistent with recent trends in the natural language processing community, where large language models trained on huge amounts of unlabeled text have achieved state-of-the-art results on tasks such as question answering, named entity recognition, and parsing