11 research outputs found
SemEval-2020 Task 4: Commonsense Validation and Explanation
In this paper, we present SemEval-2020 Task 4, Commonsense Validation and
Explanation (ComVE), which includes three subtasks, aiming to evaluate whether
a system can distinguish a natural language statement that makes sense to
humans from one that does not, and provide the reasons. Specifically, in our
first subtask, the participating systems are required to choose from two
natural language statements of similar wording the one that makes sense and the
one does not. The second subtask additionally asks a system to select the key
reason from three options why a given statement does not make sense. In the
third subtask, a participating system needs to generate the reason. We finally
attracted 39 teams participating at least one of the three subtasks. For
Subtask A and Subtask B, the performances of top-ranked systems are close to
that of humans. However, for Subtask C, there is still a relatively large gap
between systems and human performance. The dataset used in our task can be
found at https://github.com/wangcunxiang/SemEval2020-
Task4-Commonsense-Validation-and-Explanation; The leaderboard can be found at
https://competitions.codalab.org/competitions/21080#results.Comment: Task description paper of SemEval-2020 Task 4: Commonsense Validation
and Explanatio
KaLM at SemEval-2020 Task 4: Knowledge-aware Language Models for Comprehension And Generation
This paper presents our strategies in SemEval 2020 Task 4: Commonsense
Validation and Explanation. We propose a novel way to search for evidence and
choose the different large-scale pre-trained models as the backbone for three
subtasks. The results show that our evidence-searching approach improves model
performance on commonsense explanation task. Our team ranks 2nd in subtask C
according to human evaluation score.Comment: 6 pages, 1 figur
LMVE at SemEval-2020 Task 4: Commonsense Validation and Explanation using Pretraining Language Model
This paper describes our submission to subtask a and b of SemEval-2020 Task
4. For subtask a, we use a ALBERT based model with improved input form to pick
out the common sense statement from two statement candidates. For subtask b, we
use a multiple choice model enhanced by hint sentence mechanism to select the
reason from given options about why a statement is against common sense.
Besides, we propose a novel transfer learning strategy between subtasks which
help improve the performance. The accuracy scores of our system are 95.6 / 94.9
on official test set and rank 7 / 2 on Post-Evaluation
leaderboard.Comment: Accepted in SemEval2020. 7 pages, 4 figure
QiaoNing at SemEval-2020 Task 4: Commonsense Validation and Explanation system based on ensemble of language model
In this paper, we present language model system submitted to SemEval-2020
Task 4 competition: "Commonsense Validation and Explanation". We participate in
two subtasks for subtask A: validation and subtask B: Explanation. We
implemented with transfer learning using pretrained language models (BERT,
XLNet, RoBERTa, and ALBERT) and fine-tune them on this task. Then we compared
their characteristics in this task to help future researchers understand and
use these models more properly. The ensembled model better solves this problem,
making the model's accuracy reached 95.9% on subtask A, which just worse than
human's by only 3% accuracy
Is this sentence valid? An Arabic Dataset for Commonsense Validation
The commonsense understanding and validation remains a challenging task in
the field of natural language understanding. Therefore, several research papers
have been published that studied the capability of proposed systems to evaluate
the models ability to validate commonsense in text. In this paper, we present a
benchmark Arabic dataset for commonsense understanding and validation as well
as a baseline research and models trained using the same dataset. To the best
of our knowledge, this dataset is considered as the first in the field of
Arabic text commonsense validation. The dataset is distributed under the
Creative Commons BY-SA 4.0 license and can be found on GitHub.Comment: 4 page
CS-NLP team at SemEval-2020 Task 4: Evaluation of State-of-the-art NLP Deep Learning Architectures on Commonsense Reasoning Task
In this paper, we investigate a commonsense inference task that unifies
natural language understanding and commonsense reasoning. We describe our
attempt at SemEval-2020 Task 4 competition: Commonsense Validation and
Explanation (ComVE) challenge. We discuss several state-of-the-art deep
learning architectures for this challenge. Our system uses prepared labeled
textual datasets that were manually curated for three different natural
language inference subtasks. The goal of the first subtask is to test whether a
model can distinguish between natural language statements that make sense and
those that do not make sense. We compare the performance of several language
models and fine-tuned classifiers. Then, we propose a method inspired by
question/answering tasks to treat a classification problem as a multiple choice
question task to boost the performance of our experimental results (96.06%),
which is significantly better than the baseline. For the second subtask, which
is to select the reason why a statement does not make sense, we stand within
the first six teams (93.7%) among 27 participants with very competitive
results. Our result for last subtask of generating reason against the nonsense
statement shows many potentials for future researches as we applied the most
powerful generative model of language (GPT-2) with 6.1732 BLEU score among
first four teams.Comment: 6 pages, 1 figure, 2 tables, SemEval -2020, Commonsense Reasoning and
Natural Language Processin
ANA at SemEval-2020 Task 4: mUlti-task learNIng for cOmmonsense reasoNing (UNION)
In this paper, we describe our mUlti-task learNIng for cOmmonsense reasoNing
(UNION) system submitted for Task C of the SemEval2020 Task 4, which is to
generate a reason explaining why a given false statement is non-sensical.
However, we found in the early experiments that simple adaptations such as
fine-tuning GPT2 often yield dull and non-informative generations (e.g. simple
negations). In order to generate more meaningful explanations, we propose
UNION, a unified end-to-end framework, to utilize several existing commonsense
datasets so that it allows a model to learn more dynamics under the scope of
commonsense reasoning. In order to perform model selection efficiently,
accurately and promptly, we also propose a couple of auxiliary automatic
evaluation metrics so that we can extensively compare the models from different
perspectives. Our submitted system not only results in a good performance in
the proposed metrics but also outperforms its competitors with the highest
achieved score of 2.10 for human evaluation while remaining a BLEU score of
15.7. Our code is made publicly available at GitHub.Comment: 7 pages, 1 figure, 3 tables, SemEval 202
IIE-NLP-NUT at SemEval-2020 Task 4: Guiding PLM with Prompt Template Reconstruction Strategy for ComVE
This paper introduces our systems for the first two subtasks of SemEval
Task4: Commonsense Validation and Explanation. To clarify the intention for
judgment and inject contrastive information for selection, we propose the input
reconstruction strategy with prompt templates. Specifically, we formalize the
subtasks into the multiple-choice question answering format and construct the
input with the prompt templates, then, the final prediction of question
answering is considered as the result of subtasks. Experimental results show
that our approaches achieve significant performance compared with the baseline
systems. Our approaches secure the third rank on both official test sets of the
first two subtasks with an accuracy of 96.4 and an accuracy of 94.3
respectively.Comment: 8 pages, 1 figure, 5 tables, SemEval-202
Constrained Text Generation with Global Guidance -- Case Study on CommonGen
This paper studies constrained text generation, which is to generate
sentences under certain pre-conditions. We focus on CommonGen, the task of
generating text based on a set of concepts, as a representative task of
constrained text generation. Traditional methods mainly rely on supervised
training to maximize the likelihood of target sentences.However, global
constraints such as common sense and coverage cannot be incorporated into the
likelihood objective of the autoregressive decoding process. In this paper, we
consider using reinforcement learning to address the limitation, measuring
global constraints including fluency, common sense and concept coverage with a
comprehensive score, which serves as the reward for reinforcement learning.
Besides, we design a guided decoding method at the word, fragment and sentence
levels. Experiments demonstrate that our method significantly increases the
concept coverage and outperforms existing models in various automatic
evaluations
A Survey of Knowledge-Enhanced Text Generation
The goal of text generation is to make machines express in human language. It
is one of the most important yet challenging tasks in natural language
processing (NLP). Since 2014, various neural encoder-decoder models pioneered
by Seq2Seq have been proposed to achieve the goal by learning to map input text
to output text. However, the input text alone often provides limited knowledge
to generate the desired output, so the performance of text generation is still
far from satisfaction in many real-world scenarios. To address this issue,
researchers have considered incorporating various forms of knowledge beyond the
input text into the generation models. This research direction is known as
knowledge-enhanced text generation. In this survey, we present a comprehensive
review of the research on knowledge enhanced text generation over the past five
years. The main content includes two parts: (i) general methods and
architectures for integrating knowledge into text generation; (ii) specific
techniques and applications according to different forms of knowledge data.
This survey can have broad audiences, researchers and practitioners, in
academia and industry.Comment: 42 pages, 12 tables, 8 figures; Under review at ACM CSUR (revised
manuscript