15 research outputs found
Does It Make Sense? And Why? A Pilot Study for Sense Making and Explanation
Introducing common sense to natural language understanding systems has
received increasing research attention. It remains a fundamental question on
how to evaluate whether a system has a sense making capability. Existing
benchmarks measures commonsense knowledge indirectly and without explanation.
In this paper, we release a benchmark to directly test whether a system can
differentiate natural language statements that make sense from those that do
not make sense. In addition, a system is asked to identify the most crucial
reason why a statement does not make sense. We evaluate models trained over
large-scale language modeling tasks as well as human performance, showing that
there are different challenges for system sense making.Comment: This paper has been accepted by ACL201
KaLM at SemEval-2020 Task 4: Knowledge-aware Language Models for Comprehension And Generation
This paper presents our strategies in SemEval 2020 Task 4: Commonsense
Validation and Explanation. We propose a novel way to search for evidence and
choose the different large-scale pre-trained models as the backbone for three
subtasks. The results show that our evidence-searching approach improves model
performance on commonsense explanation task. Our team ranks 2nd in subtask C
according to human evaluation score.Comment: 6 pages, 1 figur
QiaoNing at SemEval-2020 Task 4: Commonsense Validation and Explanation system based on ensemble of language model
In this paper, we present language model system submitted to SemEval-2020
Task 4 competition: "Commonsense Validation and Explanation". We participate in
two subtasks for subtask A: validation and subtask B: Explanation. We
implemented with transfer learning using pretrained language models (BERT,
XLNet, RoBERTa, and ALBERT) and fine-tune them on this task. Then we compared
their characteristics in this task to help future researchers understand and
use these models more properly. The ensembled model better solves this problem,
making the model's accuracy reached 95.9% on subtask A, which just worse than
human's by only 3% accuracy
Is this sentence valid? An Arabic Dataset for Commonsense Validation
The commonsense understanding and validation remains a challenging task in
the field of natural language understanding. Therefore, several research papers
have been published that studied the capability of proposed systems to evaluate
the models ability to validate commonsense in text. In this paper, we present a
benchmark Arabic dataset for commonsense understanding and validation as well
as a baseline research and models trained using the same dataset. To the best
of our knowledge, this dataset is considered as the first in the field of
Arabic text commonsense validation. The dataset is distributed under the
Creative Commons BY-SA 4.0 license and can be found on GitHub.Comment: 4 page
LMVE at SemEval-2020 Task 4: Commonsense Validation and Explanation using Pretraining Language Model
This paper describes our submission to subtask a and b of SemEval-2020 Task
4. For subtask a, we use a ALBERT based model with improved input form to pick
out the common sense statement from two statement candidates. For subtask b, we
use a multiple choice model enhanced by hint sentence mechanism to select the
reason from given options about why a statement is against common sense.
Besides, we propose a novel transfer learning strategy between subtasks which
help improve the performance. The accuracy scores of our system are 95.6 / 94.9
on official test set and rank 7 / 2 on Post-Evaluation
leaderboard.Comment: Accepted in SemEval2020. 7 pages, 4 figure
CS-NLP team at SemEval-2020 Task 4: Evaluation of State-of-the-art NLP Deep Learning Architectures on Commonsense Reasoning Task
In this paper, we investigate a commonsense inference task that unifies
natural language understanding and commonsense reasoning. We describe our
attempt at SemEval-2020 Task 4 competition: Commonsense Validation and
Explanation (ComVE) challenge. We discuss several state-of-the-art deep
learning architectures for this challenge. Our system uses prepared labeled
textual datasets that were manually curated for three different natural
language inference subtasks. The goal of the first subtask is to test whether a
model can distinguish between natural language statements that make sense and
those that do not make sense. We compare the performance of several language
models and fine-tuned classifiers. Then, we propose a method inspired by
question/answering tasks to treat a classification problem as a multiple choice
question task to boost the performance of our experimental results (96.06%),
which is significantly better than the baseline. For the second subtask, which
is to select the reason why a statement does not make sense, we stand within
the first six teams (93.7%) among 27 participants with very competitive
results. Our result for last subtask of generating reason against the nonsense
statement shows many potentials for future researches as we applied the most
powerful generative model of language (GPT-2) with 6.1732 BLEU score among
first four teams.Comment: 6 pages, 1 figure, 2 tables, SemEval -2020, Commonsense Reasoning and
Natural Language Processin
IIE-NLP-NUT at SemEval-2020 Task 4: Guiding PLM with Prompt Template Reconstruction Strategy for ComVE
This paper introduces our systems for the first two subtasks of SemEval
Task4: Commonsense Validation and Explanation. To clarify the intention for
judgment and inject contrastive information for selection, we propose the input
reconstruction strategy with prompt templates. Specifically, we formalize the
subtasks into the multiple-choice question answering format and construct the
input with the prompt templates, then, the final prediction of question
answering is considered as the result of subtasks. Experimental results show
that our approaches achieve significant performance compared with the baseline
systems. Our approaches secure the third rank on both official test sets of the
first two subtasks with an accuracy of 96.4 and an accuracy of 94.3
respectively.Comment: 8 pages, 1 figure, 5 tables, SemEval-202
ANA at SemEval-2020 Task 4: mUlti-task learNIng for cOmmonsense reasoNing (UNION)
In this paper, we describe our mUlti-task learNIng for cOmmonsense reasoNing
(UNION) system submitted for Task C of the SemEval2020 Task 4, which is to
generate a reason explaining why a given false statement is non-sensical.
However, we found in the early experiments that simple adaptations such as
fine-tuning GPT2 often yield dull and non-informative generations (e.g. simple
negations). In order to generate more meaningful explanations, we propose
UNION, a unified end-to-end framework, to utilize several existing commonsense
datasets so that it allows a model to learn more dynamics under the scope of
commonsense reasoning. In order to perform model selection efficiently,
accurately and promptly, we also propose a couple of auxiliary automatic
evaluation metrics so that we can extensively compare the models from different
perspectives. Our submitted system not only results in a good performance in
the proposed metrics but also outperforms its competitors with the highest
achieved score of 2.10 for human evaluation while remaining a BLEU score of
15.7. Our code is made publicly available at GitHub.Comment: 7 pages, 1 figure, 3 tables, SemEval 202
CUHK at SemEval-2020 Task 4: CommonSense Explanation, Reasoning and Prediction with Multi-task Learning
This paper describes our system submitted to task 4 of SemEval 2020:
Commonsense Validation and Explanation (ComVE) which consists of three
sub-tasks. The task is to directly validate the given sentence whether or not
it makes sense and require the model to explain it. Based on BERTarchitecture
with a multi-task setting, we propose an effective and interpretable "Explain,
Reason and Predict" (ERP) system to solve the three sub-tasks about
commonsense: (a) Validation, (b)Reasoning, and (c) Explanation. Inspired by
cognitive studies of common sense, our system first generates a reason or
understanding of the sentences and then chooses which one statement makes
sense, which is achieved by multi-task learning. During the post-evaluation,
our system has reached 92.9% accuracy in subtask A (rank 11), 89.7% accuracy in
subtask B (rank 9), andBLEU score of 12.9 in subtask C (rank 8
SemEval-2020 Task 4: Commonsense Validation and Explanation
In this paper, we present SemEval-2020 Task 4, Commonsense Validation and
Explanation (ComVE), which includes three subtasks, aiming to evaluate whether
a system can distinguish a natural language statement that makes sense to
humans from one that does not, and provide the reasons. Specifically, in our
first subtask, the participating systems are required to choose from two
natural language statements of similar wording the one that makes sense and the
one does not. The second subtask additionally asks a system to select the key
reason from three options why a given statement does not make sense. In the
third subtask, a participating system needs to generate the reason. We finally
attracted 39 teams participating at least one of the three subtasks. For
Subtask A and Subtask B, the performances of top-ranked systems are close to
that of humans. However, for Subtask C, there is still a relatively large gap
between systems and human performance. The dataset used in our task can be
found at https://github.com/wangcunxiang/SemEval2020-
Task4-Commonsense-Validation-and-Explanation; The leaderboard can be found at
https://competitions.codalab.org/competitions/21080#results.Comment: Task description paper of SemEval-2020 Task 4: Commonsense Validation
and Explanatio