117 research outputs found
Automatic Article Commenting: the Task and Dataset
Comments of online articles provide extended views and improve user
engagement. Automatically making comments thus become a valuable functionality
for online forums, intelligent chatbots, etc. This paper proposes the new task
of automatic article commenting, and introduces a large-scale Chinese dataset
with millions of real comments and a human-annotated subset characterizing the
comments' varying quality. Incorporating the human bias of comment quality, we
further develop automatic metrics that generalize a broad set of popular
reference-based metrics and exhibit greatly improved correlations with human
evaluations.Comment: ACL2018; with supplements; Dataset link available in the pape
Going Beyond Linear Mode Connectivity: The Layerwise Linear Feature Connectivity
Recent work has revealed many intriguing empirical phenomena in neural
network training, despite the poorly understood and highly complex loss
landscapes and training dynamics. One of these phenomena, Linear Mode
Connectivity (LMC), has gained considerable attention due to the intriguing
observation that different solutions can be connected by a linear path in the
parameter space while maintaining near-constant training and test losses. In
this work, we introduce a stronger notion of linear connectivity, Layerwise
Linear Feature Connectivity (LLFC), which says that the feature maps of every
layer in different trained networks are also linearly connected. We provide
comprehensive empirical evidence for LLFC across a wide range of settings,
demonstrating that whenever two trained networks satisfy LMC (via either
spawning or permutation methods), they also satisfy LLFC in nearly all the
layers. Furthermore, we delve deeper into the underlying factors contributing
to LLFC, which reveal new insights into the spawning and permutation
approaches. The study of LLFC transcends and advances our understanding of LMC
by adopting a feature-learning perspective.Comment: 25 pages, 23 figure
Learning to Break the Loop: Analyzing and Mitigating Repetitions for Neural Text Generation
While large-scale neural language models, such as GPT2 and BART, have
achieved impressive results on various text generation tasks, they tend to get
stuck in undesirable sentence-level loops with maximization-based decoding
algorithms (\textit{e.g.}, greedy search). This phenomenon is counter-intuitive
since there are few consecutive sentence-level repetitions in human corpora
(e.g., 0.02\% in Wikitext-103). To investigate the underlying reasons for
generating consecutive sentence-level repetitions, we study the relationship
between the probabilities of the repetitive tokens and their previous
repetitions in the context. Through our quantitative experiments, we find that
1) Language models have a preference to repeat the previous sentence; 2) The
sentence-level repetitions have a \textit{self-reinforcement effect}: the more
times a sentence is repeated in the context, the higher the probability of
continuing to generate that sentence; 3) The sentences with higher initial
probabilities usually have a stronger self-reinforcement effect. Motivated by
our findings, we propose a simple and effective training method \textbf{DITTO}
(Pseu\underline{D}o-Repet\underline{IT}ion
Penaliza\underline{T}i\underline{O}n), where the model learns to penalize
probabilities of sentence-level repetitions from pseudo repetitive data.
Although our method is motivated by mitigating repetitions, experiments show
that DITTO not only mitigates the repetition issue without sacrificing
perplexity, but also achieves better generation quality. Extensive experiments
on open-ended text generation (Wikitext-103) and text summarization
(CNN/DailyMail) demonstrate the generality and effectiveness of our method.Comment: Accepted by NeurIPS 2022. Code is released at
https://github.com/Jxu-Thu/DITT
Profile Consistency Identification for Open-domain Dialogue Agents
Maintaining a consistent attribute profile is crucial for dialogue agents to
naturally converse with humans. Existing studies on improving attribute
consistency mainly explored how to incorporate attribute information in the
responses, but few efforts have been made to identify the consistency relations
between response and attribute profile. To facilitate the study of profile
consistency identification, we create a large-scale human-annotated dataset
with over 110K single-turn conversations and their key-value attribute
profiles. Explicit relation between response and profile is manually labeled.
We also propose a key-value structure information enriched BERT model to
identify the profile consistency, and it gained improvements over strong
baselines. Further evaluations on downstream tasks demonstrate that the profile
consistency identification model is conducive for improving dialogue
consistency.Comment: EMNLP2
- …