206 research outputs found
A large annotated corpus for learning natural language inference
Understanding entailment and contradiction is fundamental to understanding
natural language, and inference about entailment and contradiction is a
valuable testing ground for the development of semantic representations.
However, machine learning research in this area has been dramatically limited
by the lack of large-scale resources. To address this, we introduce the
Stanford Natural Language Inference corpus, a new, freely available collection
of labeled sentence pairs, written by humans doing a novel grounded task based
on image captioning. At 570K pairs, it is two orders of magnitude larger than
all other resources of its type. This increase in scale allows lexicalized
classifiers to outperform some sophisticated existing entailment models, and it
allows a neural network-based model to perform competitively on natural
language inference benchmarks for the first time.Comment: To appear at EMNLP 2015. The data will be posted shortly before the
conference (the week of 14 Sep) at http://nlp.stanford.edu/projects/snli
Textual Entailment for Cybersecurity: an Applicative Case
Recognizing Textual Entailment (RTE) is the task of recognizing the relation between two sentences, in order to measure whether and to what extent one of the two is inferred from the other. It is used in many Natural Language Processing (NLP) tasks. In the last decades, with the digitization of manylegal documents, NLP applied to the legal domain has became prominent, due to the need of knowing which norms are complied with in case other norms are. In this context, from a set of obligations that are known to be complied with, RTE may be used to infer which other norms are complied with as well. We propose a dataset, regarding cybersecurity controls, for RTE on the legal domain. The dataset has been constructed using information available online, provided by domain experts from NIST (https://www.nist.gov)
multi level alignments as an extensible representation basis for textual entailment algorithms
A major problem in research on Textual Entailment (TE) is the high implementation effort for TE systems. Recently, interoperable standards for annotation and preprocessing have been proposed. In contrast, the algorithmic level remains unstandardized, which makes component re-use in this area very difficult in practice. In this paper, we introduce multi-level alignments as a central, powerful representation for TE algorithms that encourages modular, reusable, multilingual algorithm development. We demonstrate that a pilot open-source implementation of multi-level alignment with minimal features competes with state-of-theart open-source TE engines in three languages
The SNLI Corpus
The SNLI corpus (version 1.0) is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference (NLI), also known as recognizing textual entailment (RTE). We aim for it to serve both as a benchmark for evaluating representational systems for text, especially including those induced by representation learning methods, as well as a resource for developing NLP models of any kind.We gratefully acknowledge support from a Google Faculty Research Award, a gift from Bloomberg L.P., the Defense Advanced Research Projects Agency (DARPA) Deep Exploration and Filtering of Text (DEFT) Program under Air Force Research Laboratory (AFRL) contract no. FA8750-13-2-0040, the National Science Foundation under grant no. IIS 1159679, and the Department of the Navy, Office of Naval Research, under grant no. N00014-10-1-0109. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of Google, Bloomberg L.P., DARPA, AFRL NSF, ONR, or the US government. We also thank our many excellent Mechanical Turk contributors
Generating and applying textual entailment graphs for relation extraction and email categorization
Recognizing that the meaning of one text expression is semantically related to the meaning of another can be of help in many natural language processing applications. One semantic relationship between two text expressions is captured by the textual entailment paradigm, which is defined as a relation between exactly two text expressions. Entailment relations holding among a set of more than two text expressions can be captured in the form of a hierarchical knowledge structure referred to as entailment graphs. Despite the fact that several people have worked on building entailment graphs for different types of textual expressions, little research has been carried out regarding the applicability of such entailment graphs in NLP applications. This thesis fills this research gap by investigating how entailment graphs can be generated and used for addressing two specific NLP tasks: First, the task of validating automatically derived relation extraction patterns and, second, the task of automatically categorizing German customer emails. After laying a theoretical foundation, the research problem is approached in an empirical way, i.e., by drawing conclusions from analyzing, processing, and experimenting with specific task-related datasets. The experimental results show that both tasks can benefit from the integration of semantic knowledge, as expressed by entailment graphs
Stance Detection in Web and Social Media: A Comparative Study
Online forums and social media platforms are increasingly being used to
discuss topics of varying polarities where different people take different
stances. Several methodologies for automatic stance detection from text have
been proposed in literature. To our knowledge, there has not been any
systematic investigation towards their reproducibility, and their comparative
performances. In this work, we explore the reproducibility of several existing
stance detection models, including both neural models and classical
classifier-based models. Through experiments on two datasets -- (i)~the popular
SemEval microblog dataset, and (ii)~a set of health-related online news
articles -- we also perform a detailed comparative analysis of various methods
and explore their shortcomings. Implementations of all algorithms discussed in
this paper are available at
https://github.com/prajwal1210/Stance-Detection-in-Web-and-Social-Media
Reasoning-Driven Question-Answering For Natural Language Understanding
Natural language understanding (NLU) of text is a fundamental challenge in AI, and it has received significant attention throughout the history of NLP research. This primary goal has been studied under different tasks, such as Question Answering (QA) and Textual Entailment (TE). In this thesis, we investigate the NLU problem through the QA task and focus on the aspects that make it a challenge for the current state-of-the-art technology. This thesis is organized into three main parts:
In the first part, we explore multiple formalisms to improve existing machine comprehension systems. We propose a formulation for abductive reasoning in natural language and show its effectiveness, especially in domains with limited training data. Additionally, to help reasoning systems cope with irrelevant or redundant information, we create a supervised approach to learn and detect the essential terms in questions.
In the second part, we propose two new challenge datasets. In particular, we create two datasets of natural language questions where (i) the first one requires reasoning over multiple sentences; (ii) the second one requires temporal common sense reasoning. We hope that the two proposed datasets will motivate the field to address more complex problems.
In the final part, we present the first formal framework for multi-step reasoning algorithms,
in the presence of a few important properties of language use, such as incompleteness, ambiguity, etc. We apply this framework to prove fundamental limitations for reasoning algorithms. These theoretical results provide extra intuition into the existing empirical evidence in the field
- …