1,231 research outputs found
A Survey of Paraphrasing and Textual Entailment Methods
Paraphrasing methods recognize, generate, or extract phrases, sentences, or
longer natural language expressions that convey almost the same information.
Textual entailment methods, on the other hand, recognize, generate, or extract
pairs of natural language expressions, such that a human who reads (and trusts)
the first element of a pair would most likely infer that the other element is
also true. Paraphrasing can be seen as bidirectional textual entailment and
methods from the two areas are often similar. Both kinds of methods are useful,
at least in principle, in a wide range of natural language processing
applications, including question answering, summarization, text generation, and
machine translation. We summarize key ideas from the two areas by considering
in turn recognition, generation, and extraction methods, also pointing to
prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of
Informatics, Athens University of Economics and Business, Greece, 201
Towards a Benchmark of Natural Language Arguments
The connections among natural language processing and argumentation theory
are becoming stronger in the latest years, with a growing amount of works going
in this direction, in different scenarios and applying heterogeneous
techniques. In this paper, we present two datasets we built to cope with the
combination of the Textual Entailment framework and bipolar abstract
argumentation. In our approach, such datasets are used to automatically
identify through a Textual Entailment system the relations among the arguments
(i.e., attack, support), and then the resulting bipolar argumentation graphs
are analyzed to compute the accepted arguments
Collecting Diverse Natural Language Inference Problems for Sentence Representation Evaluation
We present a large-scale collection of diverse natural language inference
(NLI) datasets that help provide insight into how well a sentence
representation captures distinct types of reasoning. The collection results
from recasting 13 existing datasets from 7 semantic phenomena into a common NLI
structure, resulting in over half a million labeled context-hypothesis pairs in
total. We refer to our collection as the DNC: Diverse Natural Language
Inference Collection. The DNC is available online at https://www.decomp.net,
and will grow over time as additional resources are recast and added from novel
sources.Comment: To be presented at EMNLP 2018. 15 page
A Survey on Recognizing Textual Entailment as an NLP Evaluation
Recognizing Textual Entailment (RTE) was proposed as a unified evaluation
framework to compare semantic understanding of different NLP systems. In this
survey paper, we provide an overview of different approaches for evaluating and
understanding the reasoning capabilities of NLP systems. We then focus our
discussion on RTE by highlighting prominent RTE datasets as well as advances in
RTE dataset that focus on specific linguistic phenomena that can be used to
evaluate NLP systems on a fine-grained level. We conclude by arguing that when
evaluating NLP systems, the community should utilize newly introduced RTE
datasets that focus on specific linguistic phenomena.Comment: 1st Workshop on Evaluation and Comparison for NLP systems (Eval4NLP)
at EMNLP 2020; 18 page
Semantic relations between sentences: from lexical to linguistically inspired semantic features and beyond
This thesis is concerned with the identification of semantic equivalence between pairs of natural language
sentences, by studying and computing models to address Natural Language Processing tasks where some
form of semantic equivalence is assessed. In such tasks, given two sentences, our models output either
a class label, corresponding to the semantic relation between the sentences, based on a predefined set
of semantic relations, or a continuous score, corresponding to their similarity on a predefined scale. The
former setup corresponds to the tasks of Paraphrase Identification and Natural Language Inference, while
the latter corresponds to the task of Semantic Textual Similarity.
We present several models for English and Portuguese, where various types of features are considered,
for instance based on distances between alternative representations of each sentence, following lexical
and semantic frameworks, or embeddings from pre-trained Bidirectional Encoder Representations from
Transformers models. For English, a new set of semantic features is proposed, from the formal semantic
representation of Discourse Representation Structure. In Portuguese, suitable corpora are scarce and formal
semantic representations are unavailable, hence an evaluation of currently available features and corpora is
conducted, following the modelling setup employed for English.
Competitive results are achieved on all tasks, for both English and Portuguese, particularly when considering
that our models are based on generally available tools and technologies, and that all features and models are
suitable for computation in most modern computers, except for those based on embeddings. In particular,
for English, our semantic features from DRS are able to improve the performance of other models, when
integrated in the feature set of such models, and state of the art results are achieved for Portuguese, with
models based on fine tuning embeddings to a specific task; Sumário:
Relações semânticas entre frases: de aspectos
lexicais a aspectos semânticos inspirados em
linguística e além destes
Esta tese é dedicada à identificação de equivalência semântica entre frases em língua natural, através do
estudo e computação de modelos destinados a tarefas de Processamento de Linguagem Natural relacionadas
com alguma forma de equivalência semântica. Em tais tarefas, a partir de duas frases, os nossos modelos
produzem uma etiqueta de classificação, que corresponde à relação semântica entre as frases, baseada
num conjunto predefinido de possíveis relações semânticas, ou um valor contínuo, que corresponde à
similaridade das frases numa escala predefinida. A primeira configuração mencionada corresponde às tarefas
de Identificação de Paráfrases e de Inferência em Língua Natural, enquanto que a última configuração
mencionada corresponde à tarefa de Similaridade Semântica em Texto.
Apresentamos diversos modelos para Inglês e Português, onde vários tipos de aspectos são considerados,
por exemplo baseados em distâncias entre representações alternativas para cada frase, seguindo formalismos
semânticos e lexicais, ou vectores contextuais de modelos previamente treinados com Representações
Codificadas Bidirecionalmente a partir de Transformadores. Para Inglês, propomos um novo conjunto de
aspectos semânticos, a partir da representação formal de semântica em Estruturas de Representação de
Discurso. Para Português, os conjuntos de dados apropriados são escassos e não estão disponíveis representações
formais de semântica, então implementámos uma avaliação de aspectos actualmente disponíveis,
seguindo a configuração de modelos aplicada para Inglês.
Obtivemos resultados competitivos em todas as tarefas, em Inglês e Português, particularmente considerando
que os nossos modelos são baseados em ferramentas e tecnologias disponíveis, e que todos
os nossos aspectos e modelos são apropriados para computação na maioria dos computadores modernos,
excepto os modelos baseados em vectores contextuais. Em particular, para Inglês, os nossos aspectos
semânticos a partir de Estruturas de Representação de Discurso melhoram o desempenho de outros modelos,
quando integrados no conjunto de aspectos de tais modelos, e obtivemos resultados estado da arte
para Português, com modelos baseados em afinação de vectores contextuais para certa tarefa
Inconsistencies Detection in Bipolar Entailment Graphs
International audienceEnglish. In the latest years, a number of real world applications have underlined the need to move from Textual Entailment (TE) pairs to TE graphs where pairs are no more independent. Moving from single pairs to a graph has the advantage of providing an overall view of the issue discussed in the text, but this may lead to possible inconsistencies due to the combination of the TE pairs into a unique graph. In this paper, we adopt argumentation theory to support human annotators in detecting the possible sources of inconsistencies. Italiano. Negli ultimi anni, in svari-ate applicazioni sta sorgendo la necessit a di passare da coppie di Textual Entail-ment (TE) a grafi di TE, in cui le cop-pie sono interconnesse. Il vantaggio dei grafi di TE e di fornire una visione glob-ale del soggetto di cui si sta discutendo nel testo. Allo stesso tempo, questopù o gener-are inconsistenze dovute all'integrazione dipì u coppie di TE in un unico grafo. In questo articolo, ci basiamo sulla teo-ria dell'argomentazione per supportare gli annotatori nell'individuare le possibili fonti di inconsistenze
- …