20 research outputs found
Bilevel Scheduled Sampling for Dialogue Generation
Exposure bias poses a common challenge in numerous natural language
processing tasks, particularly in the dialog generation. In response to this
issue, researchers have devised various techniques, among which scheduled
sampling has proven to be an effective method for mitigating exposure bias.
However, the existing state-of-the-art scheduled sampling methods solely
consider the current sampling words' quality for threshold truncation sampling,
which overlooks the importance of sentence-level information and the method of
threshold truncation warrants further discussion. In this paper, we propose a
bilevel scheduled sampling model that takes the sentence-level information into
account and incorporates it with word-level quality. To enhance sampling
diversity and improve the model's adaptability, we propose a smooth function
that maps the combined result of sentence-level and word-level information to
an appropriate range, and employ probabilistic sampling based on the mapped
values instead of threshold truncation. Experiments conducted on the
DailyDialog and PersonaChat datasets demonstrate the effectiveness of our
proposed methods, which significantly alleviate the exposure bias problem and
outperform state-of-the-art scheduled sampling methods.Comment: 13 pages, 4 figures, Natural Language Processing and Chinese
Computing(NLPCC 2023) accepte
Multimodal Representations for Teacher-Guided Compositional Visual Reasoning
Neural Module Networks (NMN) are a compelling method for visual question
answering, enabling the translation of a question into a program consisting of
a series of reasoning sub-tasks that are sequentially executed on the image to
produce an answer. NMNs provide enhanced explainability compared to integrated
models, allowing for a better understanding of the underlying reasoning
process. To improve the effectiveness of NMNs we propose to exploit features
obtained by a large-scale cross-modal encoder. Also, the current training
approach of NMNs relies on the propagation of module outputs to subsequent
modules, leading to the accumulation of prediction errors and the generation of
false answers. To mitigate this, we introduce an NMN learning strategy
involving scheduled teacher guidance. Initially, the model is fully guided by
the ground-truth intermediate outputs, but gradually transitions to an
autonomous behavior as training progresses. This reduces error accumulation,
thus improving training efficiency and final performance.We demonstrate that by
incorporating cross-modal features and employing more effective training
techniques for NMN, we achieve a favorable balance between performance and
transparency in the reasoning process
Auto-regressive Image Synthesis with Integrated Quantization
Deep generative models have achieved conspicuous progress in realistic image
synthesis with multifarious conditional inputs, while generating diverse yet
high-fidelity images remains a grand challenge in conditional image generation.
This paper presents a versatile framework for conditional image generation
which incorporates the inductive bias of CNNs and powerful sequence modeling of
auto-regression that naturally leads to diverse image generation. Instead of
independently quantizing the features of multiple domains as in prior research,
we design an integrated quantization scheme with a variational regularizer that
mingles the feature discretization in multiple domains, and markedly boosts the
auto-regressive modeling performance. Notably, the variational regularizer
enables to regularize feature distributions in incomparable latent spaces by
penalizing the intra-domain variations of distributions. In addition, we design
a Gumbel sampling strategy that allows to incorporate distribution uncertainty
into the auto-regressive training procedure. The Gumbel sampling substantially
mitigates the exposure bias that often incurs misalignment between the training
and inference stages and severely impairs the inference performance. Extensive
experiments over multiple conditional image generation tasks show that our
method achieves superior diverse image generation performance qualitatively and
quantitatively as compared with the state-of-the-art.Comment: Accepted to ECCV 2022 as Oral Presentatio
Quick Dense Retrievers Consume KALE: Post Training Kullback Leibler Alignment of Embeddings for Asymmetrical dual encoders
In this paper, we consider the problem of improving the inference latency of
language model-based dense retrieval systems by introducing structural
compression and model size asymmetry between the context and query encoders.
First, we investigate the impact of pre and post-training compression on the
MSMARCO, Natural Questions, TriviaQA, SQUAD, and SCIFACT, finding that
asymmetry in the dual encoders in dense retrieval can lead to improved
inference efficiency. Knowing this, we introduce Kullback Leibler Alignment of
Embeddings (KALE), an efficient and accurate method for increasing the
inference efficiency of dense retrieval methods by pruning and aligning the
query encoder after training. Specifically, KALE extends traditional Knowledge
Distillation after bi-encoder training, allowing for effective query encoder
compression without full retraining or index generation. Using KALE and
asymmetric training, we can generate models which exceed the performance of
DistilBERT despite having 3x faster inference.Comment: 8 pages, 4 figures, 30 table
Translating away Translationese without Parallel Data
Translated texts exhibit systematic linguistic differences compared to
original texts in the same language, and these differences are referred to as
translationese. Translationese has effects on various cross-lingual natural
language processing tasks, potentially leading to biased results. In this
paper, we explore a novel approach to reduce translationese in translated
texts: translation-based style transfer. As there are no parallel
human-translated and original data in the same language, we use a
self-supervised approach that can learn from comparable (rather than parallel)
mono-lingual original and translated data. However, even this self-supervised
approach requires some parallel data for validation. We show how we can
eliminate the need for parallel validation data by combining the
self-supervised loss with an unsupervised loss. This unsupervised loss
leverages the original language model loss over the style-transferred output
and a semantic similarity loss between the input and style-transferred output.
We evaluate our approach in terms of original vs. translationese binary
classification in addition to measuring content preservation and target-style
fluency. The results show that our approach is able to reduce translationese
classifier accuracy to a level of a random classifier after style transfer
while adequately preserving the content and fluency in the target original
style.Comment: Accepted at EMNLP 2023, Main Conferenc
To Asymmetry and Beyond: Structured Pruning of Sequence to Sequence Models for Improved Inference Efficiency
Sequence-to-sequence language models can be used to produce abstractive
summaries which are coherent, relevant, and concise. Still, model sizes can
make deployment in latency-sensitive or web-scale implementations difficult.
This paper studies the relationship between model size, structured pruning,
inference efficiency, and summarization accuracy on widely used summarization
datasets. We show that model accuracy is tied to the encoder size while
inference efficiency is connected to the decoder. Using asymmetric pruning can
lead to nearly 3x improvement in inference latency with ~1 point loss in
Rouge-2. Moreover, we find both the average degradation and the role of
asymmetry to be consistent across model sizes and variations in datasets.Comment: SustaiNLP2023 @ ACL 2023,9 pages, 6 figures, 33 table
Reinforcement Replaces Supervision: Query focused Summarization using Deep Reinforcement Learning
Query-focused Summarization (QfS) deals with systems that generate summaries
from document(s) based on a query. Motivated by the insight that Reinforcement
Learning (RL) provides a generalization to Supervised Learning (SL) for Natural
Language Generation, and thereby performs better (empirically) than SL, we use
an RL-based approach for this task of QfS. Additionally, we also resolve the
conflict of employing RL in Transformers with Teacher Forcing. We develop
multiple Policy Gradient networks, trained on various reward signals: ROUGE,
BLEU, and Semantic Similarity, which lead to a 10-point improvement over the
State-of-the-Art approach on the ROUGE-L metric for a benchmark dataset (ELI5).
We also show performance of our approach in zero-shot setting for another
benchmark dataset (DebatePedia) -- our approach leads to results comparable to
baselines, which were specifically trained on DebatePedia. To aid the RL
training, we propose a better semantic similarity reward, enabled by a novel
Passage Embedding scheme developed using Cluster Hypothesis. Lastly, we
contribute a gold-standard test dataset to further research in QfS and
Long-form Question Answering (LfQA)