1,028 research outputs found
NASARI: a novel approach to a Semantically-Aware Representation of items
The semantic representation of individual word senses and concepts is of fundamental importance to several applications in Natural Language Processing. To date, concept modeling techniques have in the main based their representation either on lexicographic resources, such as WordNet, or on encyclopedic resources, such as Wikipedia. We propose a vector representation technique that combines the complementary knowledge of both these types of resource. Thanks to its use of explicit semantics combined with a novel cluster-based dimensionality reduction and an effective weighting scheme, our representation attains state-of-the-art performance on multiple datasets in two standard benchmarks: word similarity and sense clustering. We are releasing our vector representations at http://lcl.uniroma1.it/nasari/
Distant Supervision for Entity Linking
Entity linking is an indispensable operation of populating knowledge
repositories for information extraction. It studies on aligning a textual
entity mention to its corresponding disambiguated entry in a knowledge
repository. In this paper, we propose a new paradigm named distantly supervised
entity linking (DSEL), in the sense that the disambiguated entities that belong
to a huge knowledge repository (Freebase) are automatically aligned to the
corresponding descriptive webpages (Wiki pages). In this way, a large scale of
weakly labeled data can be generated without manual annotation and fed to a
classifier for linking more newly discovered entities. Compared with
traditional paradigms based on solo knowledge base, DSEL benefits more via
jointly leveraging the respective advantages of Freebase and Wikipedia.
Specifically, the proposed paradigm facilitates bridging the disambiguated
labels (Freebase) of entities and their textual descriptions (Wikipedia) for
Web-scale entities. Experiments conducted on a dataset of 140,000 items and
60,000 features achieve a baseline F1-measure of 0.517. Furthermore, we analyze
the feature performance and improve the F1-measure to 0.545
Neural Cross-Lingual Entity Linking
A major challenge in Entity Linking (EL) is making effective use of
contextual information to disambiguate mentions to Wikipedia that might refer
to different entities in different contexts. The problem exacerbates with
cross-lingual EL which involves linking mentions written in non-English
documents to entries in the English Wikipedia: to compare textual clues across
languages we need to compute similarity between textual fragments across
languages. In this paper, we propose a neural EL model that trains fine-grained
similarities and dissimilarities between the query and candidate document from
multiple perspectives, combined with convolution and tensor networks. Further,
we show that this English-trained system can be applied, in zero-shot learning,
to other languages by making surprisingly effective use of multi-lingual
embeddings. The proposed system has strong empirical evidence yielding
state-of-the-art results in English as well as cross-lingual: Spanish and
Chinese TAC 2015 datasets.Comment: Association for the Advancement of Artificial Intelligence (AAAI),
201
An effective, low-cost measure of semantic relatedness obtained from Wikipedia links
This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Out approach is unique in that it does so using the hyperlink structure of Wikipedia rather than its category hierarchy or textual content. Evaluation with manually defined measures of semantic relatedness reveals this to be an effective compromise between the ease of computation of the former approach and the accuracy of the latter
WikiM: Metapaths based Wikification of Scientific Abstracts
In order to disseminate the exponential extent of knowledge being produced in
the form of scientific publications, it would be best to design mechanisms that
connect it with already existing rich repository of concepts -- the Wikipedia.
Not only does it make scientific reading simple and easy (by connecting the
involved concepts used in the scientific articles to their Wikipedia
explanations) but also improves the overall quality of the article. In this
paper, we present a novel metapath based method, WikiM, to efficiently wikify
scientific abstracts -- a topic that has been rarely investigated in the
literature. One of the prime motivations for this work comes from the
observation that, wikified abstracts of scientific documents help a reader to
decide better, in comparison to the plain abstracts, whether (s)he would be
interested to read the full article. We perform mention extraction mostly
through traditional tf-idf measures coupled with a set of smart filters. The
entity linking heavily leverages on the rich citation and author publication
networks. Our observation is that various metapaths defined over these networks
can significantly enhance the overall performance of the system. For mention
extraction and entity linking, we outperform most of the competing
state-of-the-art techniques by a large margin arriving at precision values of
72.42% and 73.8% respectively over a dataset from the ACL Anthology Network. In
order to establish the robustness of our scheme, we wikify three other datasets
and get precision values of 63.41%-94.03% and 67.67%-73.29% respectively for
the mention extraction and the entity linking phase
- …