512 research outputs found
A Large-Scale Multilingual Disambiguation of Glosses
Linking concepts and named entities to knowledge bases has become a crucial Natural Language Understanding task. In this respect, recent works have shown the key advantage of exploiting textual definitions in various Natural Language Processing applications. However, to date there are no reliable large-scale corpora of sense-annotated textual definitions available to the research community. In this paper we present a large-scale high-quality corpus of disambiguated glosses in multiple languages, comprising sense annotations of
both concepts and named entities from a unified sense inventory. Our approach for the construction and disambiguation of the corpus builds upon the structure of a large multilingual semantic network and a state-of-the-art disambiguation system; first, we gather complementary information of equivalent definitions across different languages to provide context for disambiguation, and then we combine it with a semantic similarity-based refinement. As a result we obtain a multilingual corpus of textual definitions featuring over 38 million definitions in 263 languages, and we make it freely available at http://lcl.uniroma1.it/disambiguated-glosses. Experiments on Open Information Extraction and Sense Clustering show how two state-of-the-art approaches improve their performance by integrating our disambiguated corpus into their pipeline
Towards Avatars with Artificial Minds: Role of Semantic Memory
he first step towards creating avatars with human-like artificial minds is to give them human-like memory structures with an access to general knowledge about the world. This type of knowledge is stored in semantic memory. Although many approaches to modeling of semantic memories have been proposed they are not very useful in real life applications because they lack knowledge comparable to the common sense that humans have, and they cannot be implemented in a computationally efficient way. The most drastic simplification of semantic memory leading to the simplest knowledge representation that is sufficient for many applications is based on the Concept Description Vectors (CDVs) that store, for each concept, an information whether a given property is applicable to this concept or not. Unfortunately even such simple information about real objects or concepts is not available. Experiments with automatic creation of concept description vectors from various sources, including ontologies, dictionaries, encyclopedias and unstructured text sources are described. Haptek-based talking head that has an access to this memory has been created as an example of a humanized interface (HIT) that can interact with web pages and exchange information in a natural way. A few examples of applications of an avatar with semantic memory are given, including the twenty questions game and automatic creation of word puzzles
SenseDefs : a multilingual corpus of semantically annotated textual definitions
Definitional knowledge has proved to be essential in various Natural Language Processing tasks and applications, especially when information at the level of word senses is exploited. However, the few sense-annotated corpora of textual definitions available to date are of limited size: this is mainly due to the expensive and time-consuming process of annotating a wide variety of word senses and entity mentions at a reasonably high scale. In this paper we present SenseDefs, a large-scale high-quality corpus of disambiguated definitions (or glosses) in multiple languages, comprising sense annotations of both concepts and named entities from a wide-coverage unified sense inventory. Our approach for the construction and disambiguation of this corpus builds upon the structure of a large multilingual semantic network and a state-of-the-art disambiguation system: first, we gather complementary information of equivalent definitions across different languages to provide context for disambiguation; then we refine the disambiguation output with a distributional approach based on semantic similarity. As a result, we obtain a multilingual corpus of textual definitions featuring over 38 million definitions in 263 languages, and we publicly release it to the research community. We assess the quality of SenseDefs’s sense annotations both intrinsically and extrinsically on Open Information Extraction and Sense Clustering tasks.Peer reviewe
- …