42,061 research outputs found
Report on the First Knowledge Graph Reasoning Challenge 2018 -- Toward the eXplainable AI System
A new challenge for knowledge graph reasoning started in 2018. Deep learning
has promoted the application of artificial intelligence (AI) techniques to a
wide variety of social problems. Accordingly, being able to explain the reason
for an AI decision is becoming important to ensure the secure and safe use of
AI techniques. Thus, we, the Special Interest Group on Semantic Web and
Ontology of the Japanese Society for AI, organized a challenge calling for
techniques that reason and/or estimate which characters are criminals while
providing a reasonable explanation based on an open knowledge graph of a
well-known Sherlock Holmes mystery story. This paper presents a summary report
of the first challenge held in 2018, including the knowledge graph
construction, the techniques proposed for reasoning and/or estimation, the
evaluation metrics, and the results. The first prize went to an approach that
formalized the problem as a constraint satisfaction problem and solved it using
a lightweight formal method; the second prize went to an approach that used
SPARQL and rules; the best resource prize went to a submission that constructed
word embedding of characters from all sentences of Sherlock Holmes novels; and
the best idea prize went to a discussion multi-agents model. We conclude this
paper with the plans and issues for the next challenge in 2019
KG^2: Learning to Reason Science Exam Questions with Contextual Knowledge Graph Embeddings
The AI2 Reasoning Challenge (ARC), a new benchmark dataset for question
answering (QA) has been recently released. ARC only contains natural science
questions authored for human exams, which are hard to answer and require
advanced logic reasoning. On the ARC Challenge Set, existing state-of-the-art
QA systems fail to significantly outperform random baseline, reflecting the
difficult nature of this task. In this paper, we propose a novel framework for
answering science exam questions, which mimics human solving process in an
open-book exam. To address the reasoning challenge, we construct contextual
knowledge graphs respectively for the question itself and supporting sentences.
Our model learns to reason with neural embeddings of both knowledge graphs.
Experiments on the ARC Challenge Set show that our model outperforms the
previous state-of-the-art QA systems
Improving Question Answering by Commonsense-Based Pre-Training
Although neural network approaches achieve remarkable success on a variety of
NLP tasks, many of them struggle to answer questions that require commonsense
knowledge. We believe the main reason is the lack of commonsense
\mbox{connections} between concepts. To remedy this, we provide a simple and
effective method that leverages external commonsense knowledge base such as
ConceptNet. We pre-train direct and indirect relational functions between
concepts, and show that these pre-trained functions could be easily added to
existing neural network models. Results show that incorporating
commonsense-based function improves the baseline on three question answering
tasks that require commonsense reasoning. Further analysis shows that our
system \mbox{discovers} and leverages useful evidence from an external
commonsense knowledge base, which is missing in existing neural network models
and help derive the correct answer.Comment: 7 page
Beyond Leaderboards: A survey of methods for revealing weaknesses in Natural Language Inference data and models
Recent years have seen a growing number of publications that analyse Natural
Language Inference (NLI) datasets for superficial cues, whether they undermine
the complexity of the tasks underlying those datasets and how they impact those
models that are optimised and evaluated on this data. This structured survey
provides an overview of the evolving research area by categorising reported
weaknesses in models and datasets and the methods proposed to reveal and
alleviate those weaknesses for the English language. We summarise and discuss
the findings and conclude with a set of recommendations for possible future
research directions. We hope it will be a useful resource for researchers who
propose new datasets, to have a set of tools to assess the suitability and
quality of their data to evaluate various phenomena of interest, as well as
those who develop novel architectures, to further understand the implications
of their improvements with respect to their model's acquired capabilities.Comment: 10 Page
Rethinking Dialogue State Tracking with Reasoning
Tracking dialogue states to better interpret user goals and feed downstream
policy learning is a bottleneck in dialogue management. Common practice has
been to treat it as a problem of classifying dialogue content into a set of
pre-defined slot-value pairs, or generating values for different slots given
the dialogue history. Both have limitations on considering dependencies that
occur on dialogues, and are lacking of reasoning capabilities. This paper
proposes to track dialogue states gradually with reasoning over dialogue turns
with the help of the back-end data. Empirical results demonstrate that our
method significantly outperforms the state-of-the-art methods by 38.6% in terms
of joint belief accuracy for MultiWOZ 2.1, a large-scale human-human dialogue
dataset across multiple domains.Comment: further modification neede
WinoWhy: A Deep Diagnosis of Essential Commonsense Knowledge for Answering Winograd Schema Challenge
In this paper, we present the first comprehensive categorization of essential
commonsense knowledge for answering the Winograd Schema Challenge (WSC). For
each of the questions, we invite annotators to first provide reasons for making
correct decisions and then categorize them into six major knowledge categories.
By doing so, we better understand the limitation of existing methods (i.e.,
what kind of knowledge cannot be effectively represented or inferred with
existing methods) and shed some light on the commonsense knowledge that we need
to acquire in the future for better commonsense reasoning. Moreover, to
investigate whether current WSC models can understand the commonsense or they
simply solve the WSC questions based on the statistical bias of the dataset, we
leverage the collected reasons to develop a new task called WinoWhy, which
requires models to distinguish plausible reasons from very similar but wrong
reasons for all WSC questions. Experimental results prove that even though
pre-trained language representation models have achieved promising progress on
the original WSC dataset, they are still struggling at WinoWhy. Further
experiments show that even though supervised models can achieve better
performance, the performance of these models can be sensitive to the dataset
distribution. WinoWhy and all codes are available at:
https://github.com/HKUST-KnowComp/WinoWhy.Comment: Accepted by ACL 202
A Review of Winograd Schema Challenge Datasets and Approaches
The Winograd Schema Challenge is both a commonsense reasoning and natural
language understanding challenge, introduced as an alternative to the Turing
test. A Winograd schema is a pair of sentences differing in one or two words
with a highly ambiguous pronoun, resolved differently in the two sentences,
that appears to require commonsense knowledge to be resolved correctly. The
examples were designed to be easily solvable by humans but difficult for
machines, in principle requiring a deep understanding of the content of the
text and the situation it describes. This paper reviews existing Winograd
Schema Challenge benchmark datasets and approaches that have been published
since its introduction
A Survey of Document Grounded Dialogue Systems (DGDS)
Dialogue system (DS) attracts great attention from industry and academia
because of its wide application prospects. Researchers usually divide the DS
according to the function. However, many conversations require the DS to switch
between different functions. For example, movie discussion can change from
chit-chat to QA, the conversational recommendation can transform from chit-chat
to recommendation, etc. Therefore, classification according to functions may
not be enough to help us appreciate the current development trend. We classify
the DS based on background knowledge. Specifically, study the latest DS based
on the unstructured document(s). We define Document Grounded Dialogue System
(DGDS) as the DS that the dialogues are centering on the given document(s). The
DGDS can be used in scenarios such as talking over merchandise against product
Manual, commenting on news reports, etc. We believe that extracting
unstructured document(s) information is the future trend of the DS because a
great amount of human knowledge lies in these document(s). The research of the
DGDS not only possesses a broad application prospect but also facilitates AI to
better understand human knowledge and natural language. We analyze the
classification, architecture, datasets, models, and future development trends
of the DGDS, hoping to help researchers in this field.Comment: 30 pages, 4 figures, 13 table
SocialIQA: Commonsense Reasoning about Social Interactions
We introduce Social IQa, the first largescale benchmark for commonsense
reasoning about social situations. Social IQa contains 38,000 multiple choice
questions for probing emotional and social intelligence in a variety of
everyday situations (e.g., Q: "Jordan wanted to tell Tracy a secret, so Jordan
leaned towards Tracy. Why did Jordan do this?" A: "Make sure no one else could
hear"). Through crowdsourcing, we collect commonsense questions along with
correct and incorrect answers about social interactions, using a new framework
that mitigates stylistic artifacts in incorrect answers by asking workers to
provide the right answer to a different but related question. Empirical results
show that our benchmark is challenging for existing question-answering models
based on pretrained language models, compared to human performance (>20% gap).
Notably, we further establish Social IQa as a resource for transfer learning of
commonsense knowledge, achieving state-of-the-art performance on multiple
commonsense reasoning tasks (Winograd Schemas, COPA).Comment: the first two authors contributed equally; accepted to EMNLP 2019;
camera ready versio
Factor Graph Attention
Dialog is an effective way to exchange information, but subtle details and
nuances are extremely important. While significant progress has paved a path to
address visual dialog with algorithms, details and nuances remain a challenge.
Attention mechanisms have demonstrated compelling results to extract details in
visual question answering and also provide a convincing framework for visual
dialog due to their interpretability and effectiveness. However, the many data
utilities that accompany visual dialog challenge existing attention techniques.
We address this issue and develop a general attention mechanism for visual
dialog which operates on any number of data utilities. To this end, we design a
factor graph based attention mechanism which combines any number of utility
representations. We illustrate the applicability of the proposed approach on
the challenging and recently introduced VisDial datasets, outperforming recent
state-of-the-art methods by 1.1% for VisDial0.9 and by 2% for VisDial1.0 on
MRR. Our ensemble model improved the MRR score on VisDial1.0 by more than 6%.Comment: Accepted to CVPR 2019; revised version includes bottom-up feature
- …