233 research outputs found
Improving Background Based Conversation with Context-aware Knowledge Pre-selection
Background Based Conversations (BBCs) have been developed to make dialogue
systems generate more informative and natural responses by leveraging
background knowledge. Existing methods for BBCs can be grouped into two
categories: extraction-based methods and generation-based methods. The former
extract spans frombackground material as responses that are not necessarily
natural. The latter generate responses thatare natural but not necessarily
effective in leveraging background knowledge. In this paper, we focus on
generation-based methods and propose a model, namely Context-aware Knowledge
Pre-selection (CaKe), which introduces a pre-selection process that uses
dynamic bi-directional attention to improve knowledge selection by using the
utterance history context as prior information to select the most relevant
background material. Experimental results show that our model is superior to
current state-of-the-art baselines, indicating that it benefits from the
pre-selection process, thus improving in-formativeness and fluency.Comment: SCAI 2019 workshop pape
Detecting and Classifying Malevolent Dialogue Responses: Taxonomy, Data and Methodology
Conversational interfaces are increasingly popular as a way of connecting
people to information. Corpus-based conversational interfaces are able to
generate more diverse and natural responses than template-based or
retrieval-based agents. With their increased generative capacity of corpusbased
conversational agents comes the need to classify and filter out malevolent
responses that are inappropriate in terms of content and dialogue acts.
Previous studies on the topic of recognizing and classifying inappropriate
content are mostly focused on a certain category of malevolence or on single
sentences instead of an entire dialogue. In this paper, we define the task of
Malevolent Dialogue Response Detection and Classification (MDRDC). We make
three contributions to advance research on this task. First, we present a
Hierarchical Malevolent Dialogue Taxonomy (HMDT). Second, we create a labelled
multi-turn dialogue dataset and formulate the MDRDC task as a hierarchical
classification task over this taxonomy. Third, we apply stateof-the-art text
classification methods to the MDRDC task and report on extensive experiments
aimed at assessing the performance of these approaches.Comment: under review at JASIS
FF2: A Feature Fusion Two-Stream Framework for Punctuation Restoration
To accomplish punctuation restoration, most existing methods focus on
introducing extra information (e.g., part-of-speech) or addressing the class
imbalance problem. Recently, large-scale transformer-based pre-trained language
models (PLMS) have been utilized widely and obtained remarkable success.
However, the PLMS are trained on the large dataset with marks, which may not
fit well with the small dataset without marks, causing the convergence to be
not ideal. In this study, we propose a Feature Fusion two-stream framework
(FF2) to bridge the gap. Specifically, one stream leverages a pre-trained
language model to capture the semantic feature, while another auxiliary module
captures the feature at hand. We also modify the computation of multi-head
attention to encourage communication among heads. Then, two features with
different perspectives are aggregated to fuse information and enhance context
awareness. Without additional data, the experimental results on the popular
benchmark IWSLT demonstrate that FF2 achieves new SOTA performance, which
verifies that our approach is effective.Comment: 5pages. arXiv admin note: substantial text overlap with
arXiv:2203.1248
TLM: Token-Level Masking for Transformers
Structured dropout approaches, such as attention dropout and DropHead, have
been investigated to regularize the multi-head attention mechanism in
Transformers. In this paper, we propose a new regularization scheme based on
token-level rather than structure-level to reduce overfitting. Specifically, we
devise a novel Token-Level Masking (TLM) training strategy for Transformers to
regularize the connections of self-attention, which consists of two masking
techniques that are effective and easy to implement. The underlying idea is to
manipulate the connections between tokens in the multi-head attention via
masking, where the networks are forced to exploit partial neighbors'
information to produce a meaningful representation. The generality and
effectiveness of TLM are thoroughly evaluated via extensive experiments on 4
diversified NLP tasks across 18 datasets, including natural language
understanding benchmark GLUE, ChineseGLUE, Chinese Grammatical Error
Correction, and data-to-text generation. The results indicate that TLM can
consistently outperform attention dropout and DropHead, e.g., it increases by
0.5 points relative to DropHead with BERT-large on GLUE. Moreover, TLM can
establish a new record on the data-to-text benchmark Rotowire (18.93 BLEU). Our
code will be publicly available at https://github.com/Young1993/tlm.Comment: 13 pages. Accepted by EMNLP2023 main conferenc
Rethinking the Reference-based Distinctive Image Captioning
Distinctive Image Captioning (DIC) -- generating distinctive captions that
describe the unique details of a target image -- has received considerable
attention over the last few years. A recent DIC work proposes to generate
distinctive captions by comparing the target image with a set of
semantic-similar reference images, i.e., reference-based DIC (Ref-DIC). It aims
to make the generated captions can tell apart the target and reference images.
Unfortunately, reference images used by existing Ref-DIC works are easy to
distinguish: these reference images only resemble the target image at
scene-level and have few common objects, such that a Ref-DIC model can
trivially generate distinctive captions even without considering the reference
images. To ensure Ref-DIC models really perceive the unique objects (or
attributes) in target images, we first propose two new Ref-DIC benchmarks.
Specifically, we design a two-stage matching mechanism, which strictly controls
the similarity between the target and reference images at object-/attribute-
level (vs. scene-level). Secondly, to generate distinctive captions, we develop
a strong Transformer-based Ref-DIC baseline, dubbed as TransDIC. It not only
extracts visual features from the target image, but also encodes the
differences between objects in the target and reference images. Finally, for
more trustworthy benchmarking, we propose a new evaluation metric named
DisCIDEr for Ref-DIC, which evaluates both the accuracy and distinctiveness of
the generated captions. Experimental results demonstrate that our TransDIC can
generate distinctive captions. Besides, it outperforms several state-of-the-art
models on the two new benchmarks over different metrics.Comment: ACM MM 202
- …