24 research outputs found
SMRT Chatbots: Improving Non-Task-Oriented Dialog with Simulated Multiple Reference Training
Non-task-oriented dialog models suffer from poor quality and non-diverse
responses. To overcome limited conversational data, we apply Simulated Multiple
Reference Training (SMRT; Khayrallah et al., 2020), and use a paraphraser to
simulate multiple responses per training prompt. We find SMRT improves over a
strong Transformer baseline as measured by human and automatic quality scores
and lexical diversity. We also find SMRT is comparable to pretraining in human
evaluation quality, and outperforms pretraining on automatic quality and
lexical diversity, without requiring related-domain dialog data.Comment: EMNLP 2020 Camera Read
Modular Mechanistic Networks: On Bridging Mechanistic and Phenomenological Models with Deep Neural Networks in Natural Language Processing
Natural language processing (NLP) can be done using either top-down (theory driven) and bottom-up (data driven) approaches, which we call mechanistic and phenomenological respectively. The approaches are frequently considered to stand in opposition to each other. Examining some recent approaches in deep learning we argue that deep neural networks incorporate both perspectives and, furthermore, that leveraging this aspect of deep learning may help in solving complex problems within language technology, such as modelling language and perception in the domain of spatial cognition
CoRec: An Easy Approach for Coordination Recognition
In this paper, we observe and address the challenges of the coordination
recognition task. Most existing methods rely on syntactic parsers to identify
the coordinators in a sentence and detect the coordination boundaries. However,
state-of-the-art syntactic parsers are slow and suffer from errors, especially
for long and complicated sentences. To better solve the problems, we propose a
pipeline model COordination RECognizer (CoRec). It consists of two components:
coordinator identifier and conjunct boundary detector. The experimental results
on datasets from various domains demonstrate the effectiveness and efficiency
of the proposed method. Further experiments show that CoRec positively impacts
downstream tasks, improving the yield of state-of-the-art Open IE models.Comment: Accepted by EMNLP 2023 Main Conference (oral presentation
Seen to Unseen: Exploring Compositional Generalization of Multi-Attribute Controllable Dialogue Generation
Existing controllable dialogue generation work focuses on the
single-attribute control and lacks generalization capability to
out-of-distribution multiple attribute combinations. In this paper, we explore
the compositional generalization for multi-attribute controllable dialogue
generation where a model can learn from seen attribute values and generalize to
unseen combinations. We propose a prompt-based disentangled controllable
dialogue generation model, DCG. It learns attribute concept composition by
generating attribute-oriented prompt vectors and uses a disentanglement loss to
disentangle different attributes for better generalization. Besides, we design
a unified reference-free evaluation framework for multiple attributes with
different levels of granularities. Experiment results on two benchmarks prove
the effectiveness of our method and the evaluation metric.Comment: ACL 2023 Main Conferenc
Recommended from our members
ANLIzing the Adversarial Natural Language Inference Dataset
We perform an in-depth error analysis of the Adversarial NLI (ANLI) dataset, a recently introduced large-scale human-and-model-in-the-loop natural language inference dataset collected dynamically over multiple rounds. We propose a fine-grained annotation scheme for the different aspects of inference responsible for the gold classification labels, and use it to hand-code the ANLI development sets in their entirety. We use these annotations to answer a variety of important questions: which models have the highest performance on each inference type, which inference types are most common, and which types are the most challenging for state-of-the-art models? We hope our annotations will enable more fine-grained evaluation of NLI models, and provide a deeper understanding of where models fail (and succeed). Both insights can guide us in training stronger models going forward
EpiK-Eval: Evaluation for Language Models as Epistemic Models
In the age of artificial intelligence, the role of large language models
(LLMs) is becoming increasingly central. Despite their growing prevalence,
their capacity to consolidate knowledge from different training documents - a
crucial ability in numerous applications - remains unexplored. This paper
presents the first study examining the capability of LLMs to effectively
combine such information within their parameter space. We introduce EpiK-Eval,
a novel question-answering benchmark tailored to evaluate LLMs' proficiency in
formulating a coherent and consistent knowledge representation from segmented
narratives. Evaluations across various LLMs reveal significant weaknesses in
this domain. We contend that these shortcomings stem from the intrinsic nature
of prevailing training objectives. Consequently, we advocate for refining the
approach towards knowledge consolidation, as it harbors the potential to
dramatically improve their overall effectiveness and performance. The findings
from this study offer insights for developing more robust and reliable LLMs.
Our code and benchmark are available at
https://github.com/chandar-lab/EpiK-Eva