42 research outputs found
Multi-aspect Repetition Suppression and Content Moderation of Large Language Models
Natural language generation is one of the most impactful fields in NLP, and
recent years have witnessed its evolution brought about by large language
models (LLMs). As the key instrument for writing assistance applications, they
are generally prone to replicating or extending offensive content provided in
the input. In low-resource data regime, they can also lead to repetitive
outputs (Holtzman et al., 2019) [1]. Usually, offensive content and repetitions
are mitigated with post-hoc methods, including n-gram level blocklists, top-k
and nucleus sampling. In this paper, we introduce a combination of exact and
non-exact repetition suppression using token and sequence level unlikelihood
loss, repetition penalty during training, inference, and post-processing
respectively. We further explore multi-level unlikelihood loss to the extent
that it endows the model with abilities to avoid generating offensive words and
phrases from the beginning. Finally, with comprehensive experiments, we
demonstrate that our proposed methods work exceptionally in controlling the
repetition and content quality of LLM outputs
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
In this work, we propose to study the performance of a model trained with a
sentence embedding regression loss component for the Automated Audio Captioning
task. This task aims to build systems that can describe audio content with a
single sentence written in natural language. Most systems are trained with the
standard Cross-Entropy loss, which does not take into account the semantic
closeness of the sentence. We found that adding a sentence embedding loss term
reduces overfitting, but also increased SPIDEr from 0.397 to 0.418 in our first
setting on the AudioCaps corpus. When we increased the weight decay value, we
found our model to be much closer to the current state-of-the-art methods, with
a SPIDEr score up to 0.444 compared to a 0.475 score. Moreover, this model uses
eight times less trainable parameters. In this training setting, the sentence
embedding loss has no more impact on the model performance
ColdGANs: Taming Language GANs with Cautious Sampling Strategies
Training regimes based on Maximum Likelihood Estimation (MLE) suffer from
known limitations, often leading to poorly generated text sequences. At the
root of these limitations is the mismatch between training and inference, i.e.
the so-called exposure bias, exacerbated by considering only the reference
texts as correct, while in practice several alternative formulations could be
as good. Generative Adversarial Networks (GANs) can mitigate those limitations
but the discrete nature of text has hindered their application to language
generation: the approaches proposed so far, based on Reinforcement Learning,
have been shown to underperform MLE. Departing from previous works, we analyze
the exploration step in GANs applied to text generation, and show how classical
sampling results in unstable training. We propose to consider alternative
exploration strategies in a GAN framework that we name ColdGANs, where we force
the sampling to be close to the distribution modes to get smoother learning
dynamics. For the first time, to the best of our knowledge, the proposed
language GANs compare favorably to MLE, and obtain improvements over the
state-of-the-art on three generative tasks, namely unconditional text
generation, question generation, and abstractive summarization