494 research outputs found
Learning to Write with Coherence From Negative Examples
Coherence is one of the critical factors that determine the quality of
writing. We propose writing relevance (WR) training method for neural
encoder-decoder natural language generation (NLG) models which improves
coherence of the continuation by leveraging negative examples. WR loss
regresses the vector representation of the context and generated sentence
toward positive continuation by contrasting it with the negatives. We compare
our approach with Unlikelihood (UL) training in a text continuation task on
commonsense natural language inference (NLI) corpora to show which method
better models the coherence by avoiding unlikely continuations. The preference
of our approach in human evaluation shows the efficacy of our method in
improving coherence.Comment: 4+1 pages, 4 figures, 2 tables. ICASSP 2022 rejecte
Multi-aspect Repetition Suppression and Content Moderation of Large Language Models
Natural language generation is one of the most impactful fields in NLP, and
recent years have witnessed its evolution brought about by large language
models (LLMs). As the key instrument for writing assistance applications, they
are generally prone to replicating or extending offensive content provided in
the input. In low-resource data regime, they can also lead to repetitive
outputs (Holtzman et al., 2019) [1]. Usually, offensive content and repetitions
are mitigated with post-hoc methods, including n-gram level blocklists, top-k
and nucleus sampling. In this paper, we introduce a combination of exact and
non-exact repetition suppression using token and sequence level unlikelihood
loss, repetition penalty during training, inference, and post-processing
respectively. We further explore multi-level unlikelihood loss to the extent
that it endows the model with abilities to avoid generating offensive words and
phrases from the beginning. Finally, with comprehensive experiments, we
demonstrate that our proposed methods work exceptionally in controlling the
repetition and content quality of LLM outputs
Token Imbalance Adaptation for Radiology Report Generation
Imbalanced token distributions naturally exist in text documents, leading
neural language models to overfit on frequent tokens. The token imbalance may
dampen the robustness of radiology report generators, as complex medical terms
appear less frequently but reflect more medical information. In this study, we
demonstrate how current state-of-the-art models fail to generate infrequent
tokens on two standard benchmark datasets (IU X-RAY and MIMIC-CXR) of radiology
report generation. % However, no prior study has proposed methods to adapt
infrequent tokens for text generators feeding with medical images. To solve the
challenge, we propose the \textbf{T}oken \textbf{Im}balance Adapt\textbf{er}
(\textit{TIMER}), aiming to improve generation robustness on infrequent tokens.
The model automatically leverages token imbalance by an unlikelihood loss and
dynamically optimizes generation processes to augment infrequent tokens. We
compare our approach with multiple state-of-the-art methods on the two
benchmarks. Experiments demonstrate the effectiveness of our approach in
enhancing model robustness overall and infrequent tokens. Our ablation analysis
shows that our reinforcement learning method has a major effect in adapting
token imbalance for radiology report generation.Comment: Accepted by CHIL202
Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation
Zero-shot translation (ZST), which is generally based on a multilingual
neural machine translation model, aims to translate between unseen language
pairs in training data. The common practice to guide the zero-shot language
mapping during inference is to deliberately insert the source and target
language IDs, e.g., for English and for German. Recent studies have
shown that language IDs sometimes fail to navigate the ZST task, making them
suffer from the off-target problem (non-target language words exist in the
generated translation) and, therefore, difficult to apply the current
multilingual translation model to a broad range of zero-shot language
scenarios. To understand when and why the navigation capabilities of language
IDs are weakened, we compare two extreme decoder input cases in the ZST
directions: Off-Target (OFF) and On-Target (ON) cases. By contrastively
visualizing the contextual word representations (CWRs) of these cases with
teacher forcing, we show that 1) the CWRs of different languages are
effectively distributed in separate regions when the sentence and ID are
matched (ON setting), and 2) if the sentence and ID are unmatched (OFF
setting), the CWRs of different languages are chaotically distributed. Our
analyses suggest that although they work well in ideal ON settings, language
IDs become fragile and lose their navigation ability when faced with off-target
tokens, which commonly exist during inference but are rare in training
scenarios. In response, we employ unlikelihood tuning on the negative (OFF)
samples to minimize their probability such that the language IDs can
discriminate between the on- and off-target tokens during training. Experiments
spanning 40 ZST directions show that our method reduces the off-target ratio by
-48.0% on average, leading to a +9.1 BLEU improvement with only an extra +0.3%
tuning cost
- …