27 research outputs found
Regularizing Neural Machine Translation by Target-bidirectional Agreement
Although Neural Machine Translation (NMT) has achieved remarkable progress in
the past several years, most NMT systems still suffer from a fundamental
shortcoming as in other sequence generation tasks: errors made early in
generation process are fed as inputs to the model and can be quickly amplified,
harming subsequent sequence generation. To address this issue, we propose a
novel model regularization method for NMT training, which aims to improve the
agreement between translations generated by left-to-right (L2R) and
right-to-left (R2L) NMT decoders. This goal is achieved by introducing two
Kullback-Leibler divergence regularization terms into the NMT training
objective to reduce the mismatch between output probabilities of L2R and R2L
models. In addition, we also employ a joint training strategy to allow L2R and
R2L models to improve each other in an interactive update process. Experimental
results show that our proposed method significantly outperforms
state-of-the-art baselines on Chinese-English and English-German translation
tasks.Comment: Accepted by AAAI 201
Modeling Paragraph-Level Vision-Language Semantic Alignment for Multi-Modal Summarization
Most current multi-modal summarization methods follow a cascaded manner,
where an off-the-shelf object detector is first used to extract visual
features, then these features are fused with language representations to
generate the summary with an encoder-decoder model. The cascaded way cannot
capture the semantic alignments between images and paragraphs, which are
crucial to a precise summary. In this paper, we propose ViL-Sum to jointly
model paragraph-level \textbf{Vi}sion-\textbf{L}anguage Semantic Alignment and
Multi-Modal \textbf{Sum}marization. The core of ViL-Sum is a joint multi-modal
encoder with two well-designed tasks, image reordering and image selection. The
joint multi-modal encoder captures the interactions between modalities, where
the reordering task guides the model to learn paragraph-level semantic
alignment and the selection task guides the model to selected summary-related
images in the final summary. Experimental results show that our proposed
ViL-Sum significantly outperforms current state-of-the-art methods. In further
analysis, we find that two well-designed tasks and joint multi-modal encoder
can effectively guide the model to learn reasonable paragraphs-images and
summary-images relations
Contrastive Learning with Prompt-derived Virtual Semantic Prototypes for Unsupervised Sentence Embedding
Contrastive learning has become a new paradigm for unsupervised sentence
embeddings. Previous studies focus on instance-wise contrastive learning,
attempting to construct positive pairs with textual data augmentation. In this
paper, we propose a novel Contrastive learning method with Prompt-derived
Virtual semantic Prototypes (ConPVP). Specifically, with the help of prompts,
we construct virtual semantic prototypes to each instance, and derive negative
prototypes by using the negative form of the prompts. Using a prototypical
contrastive loss, we enforce the anchor sentence embedding to be close to its
corresponding semantic prototypes, and far apart from the negative prototypes
as well as the prototypes of other sentences. Extensive experimental results on
semantic textual similarity, transfer, and clustering tasks demonstrate the
effectiveness of our proposed model compared to strong baselines. Code is
available at https://github.com/lemon0830/promptCSE.Comment: Findings of EMNLP 202
Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System
Large-scale Language Models (LLMs) are constrained by their inability to
process lengthy inputs. To address this limitation, we propose the
Self-Controlled Memory (SCM) system to unleash infinite-length input capacity
for large-scale language models. Our SCM system is composed of three key
modules: the language model agent, the memory stream, and the memory
controller. The language model agent iteratively processes ultra-long inputs
and stores all historical information in the memory stream. The memory
controller provides the agent with both long-term memory (archived memory) and
short-term memory (flash memory) to generate precise and coherent responses.
The controller determines which memories from archived memory should be
activated and how to incorporate them into the model input. Our SCM system can
be integrated with any LLMs to enable them to process ultra-long texts without
any modification or fine-tuning. Experimental results show that our SCM system
enables LLMs, which are not optimized for multi-turn dialogue, to achieve
multi-turn dialogue capabilities that are comparable to ChatGPT, and to
outperform ChatGPT in scenarios involving ultra-long document summarization or
long-term conversations. Additionally, we will supply a test set, which covers
common long-text input scenarios, for evaluating the abilities of LLMs in
processing long documents.~\footnote{Working in
progress.}\footnote{\url{https://github.com/wbbeyourself/SCM4LLMs}}Comment: Working in progres
Retrieval-Augmented Classification with Decoupled Representation
Retrieval augmented methods have shown promising results in various
classification tasks. However, existing methods focus on retrieving extra
context to enrich the input, which is noise sensitive and non-expandable. In
this paper, following this line, we propose a -nearest-neighbor (KNN) -based
method for retrieval augmented classifications, which interpolates the
predicted label distribution with retrieved instances' label distributions.
Different from the standard KNN process, we propose a decoupling mechanism as
we find that shared representation for classification and retrieval hurts
performance and leads to training instability. We evaluate our method on a wide
range of classification datasets. Experimental results demonstrate the
effectiveness and robustness of our proposed method. We also conduct extra
experiments to analyze the contributions of different components in our
model.\footnote{\url{https://github.com/xnliang98/knn-cls-w-decoupling}}Comment: preprin