17 research outputs found
Generating and applying textual entailment graphs for relation extraction and email categorization
Recognizing that the meaning of one text expression is semantically related to the meaning of another can be of help in many natural language processing applications. One semantic relationship between two text expressions is captured by the textual entailment paradigm, which is defined as a relation between exactly two text expressions. Entailment relations holding among a set of more than two text expressions can be captured in the form of a hierarchical knowledge structure referred to as entailment graphs. Despite the fact that several people have worked on building entailment graphs for different types of textual expressions, little research has been carried out regarding the applicability of such entailment graphs in NLP applications. This thesis fills this research gap by investigating how entailment graphs can be generated and used for addressing two specific NLP tasks: First, the task of validating automatically derived relation extraction patterns and, second, the task of automatically categorizing German customer emails. After laying a theoretical foundation, the research problem is approached in an empirical way, i.e., by drawing conclusions from analyzing, processing, and experimenting with specific task-related datasets. The experimental results show that both tasks can benefit from the integration of semantic knowledge, as expressed by entailment graphs
Becoming JILDA
The difficulty in finding use-ful dialogic data to train a conversationalagent is an open issue even nowadays,when chatbots and spoken dialogue sys-tems are widely used. For this reason wedecided to build JILDA, a novel data col-lection of chat-based dialogues, producedby Italian native speakers and related to thejob-offer domain. JILDA is the first dia-logue collection related to this domain forthe Italian language. Because of its collec-tion modalities, we believe that JILDA canbe a useful resource not only for the Italianresearch community, but also for the inter-national one
multi level alignments as an extensible representation basis for textual entailment algorithms
A major problem in research on Textual Entailment (TE) is the high implementation effort for TE systems. Recently, interoperable standards for annotation and preprocessing have been proposed. In contrast, the algorithmic level remains unstandardized, which makes component re-use in this area very difficult in practice. In this paper, we introduce multi-level alignments as a central, powerful representation for TE algorithms that encourages modular, reusable, multilingual algorithm development. We demonstrate that a pilot open-source implementation of multi-level alignment with minimal features competes with state-of-theart open-source TE engines in three languages
Learning Attention-based Embeddings for Relation Prediction in Knowledge Graphs
The recent proliferation of knowledge graphs
(KGs) coupled with incomplete or partial information, in the form of missing relations
(links) between entities, has fueled a lot of
research on knowledge base completion (also
known as relation prediction). Several recent works suggest that convolutional neural
network (CNN) based models generate richer
and more expressive feature embeddings and
hence also perform well on relation prediction.
However, we observe that these KG embeddings treat triples independently and thus fail
to cover the complex and hidden information
that is inherently implicit in the local neighborhood surrounding a triple. To this effect, our
paper proposes a novel attention-based feature
embedding that captures both entity and relation features in any given entity’s neighborhood. Additionally, we also encapsulate relation clusters and multi-hop relations in our
model. Our empirical study offers insights
into the efficacy of our attention-based model
and we show marked performance gains in
comparison to state-of-the-art methods on all
datasets
Textual entailment from image caption denotations
Understanding the meaning of linguistic expressions is a fundamental task of natural language processing. While distributed representations have become a powerful technique for modeling lexical semantics, but they have traditionally relied on ungrounded text corpora to identify semantically similar words. In contrast, this thesis explicitly models the denotation of linguistic expressions by building representations from grounded image captions. This allows us to use descriptions of the world to learn connections that would be difficult to identify in text-based corpora. In particular, we explore novel approaches to entailment that capture everyday world knowledge missing from other NLP tasks, on both existing datasets and our own new dataset. We also present a novel embedding model that produces phrase representations that are informed by our grounded representation. We conclude with an analysis of how grounded embeddings differ from standard distributional embeddings and suggestions for future refinement of this approach
Temporality and modality in entailment graph induction
The ability to draw inferences is core to semantics and the field of Natural Language
Processing. Answering a seemingly simple question like ‘Did Arsenal play Manchester
yesterday’ from textual evidence that says ‘Arsenal won against Manchester yesterday’
requires modeling the inference that ‘winning’ entails ‘playing’. One way of
modeling this type of lexical semantics is with Entailment Graphs, collections of meaning
postulates that can be learned in an unsupervised way from large text corpora.
In this work, we explore the role that temporality and linguistic modality can play
in inducing Entailment Graphs. We identify inferences that were previously not supported
by Entailment Graphs (such as that ‘visiting’ entails an ‘arrival’ before the visit)
and inferences that were likely to be learned incorrectly (such as that ‘winning’ entails
‘losing’). Temporality is shown to be useful in alleviating these challenges, in the
Entailment Graph representation as well as the learning algorithm. An exploration of
linguistic modality in the training data shows, counterintuitively, that there is valuable
signal in modalized predications. We develop three datasets for evaluating a system’s
capability of modeling these inferences, which were previously underrepresented in
entailment rule evaluations. Finally, in support of the work on modality, we release
a relation extraction system that is capable of annotating linguistic modality, together
with a comprehensive modality lexicon