102 research outputs found
User driven information extraction with LODIE
Information Extraction (IE) is the technique for transforming unstructured or semi-structured data into structured representation
that can be understood by machines. In this paper we use a user-driven
Information Extraction technique to wrap entity-centric Web pages. The
user can select concepts and properties of interest from available Linked
Data. Given a number of websites containing pages about the concepts of
interest, the method will exploit (i) recurrent structures in the Web pages
and (ii) available knowledge in Linked data to extract the information
of interest from the Web pages
Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval
Tables contain valuable knowledge in a structured form. We employ neural
language modeling approaches to embed tabular data into vector spaces.
Specifically, we consider different table elements, such caption, column
headings, and cells, for training word and entity embeddings. These embeddings
are then utilized in three particular table-related tasks, row population,
column population, and table retrieval, by incorporating them into existing
retrieval models as additional semantic similarity signals. Evaluation results
show that table embeddings can significantly improve upon the performance of
state-of-the-art baselines.Comment: Proceedings of the 42nd International ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR '19), 201
WibNED: Wikipedia based Named Entity Disambiguation
Natural Language is a mean to express and discuss concepts, which are taken to be abstractions from perceptions of the experienced real world: what texts describe consist of objects and events. Objects of the real world are identified
by proper names, which are words, thus raising the problem of proper linkage
between the textual reference and the real object. This work addresses the problem
of automatically association of meanings to words within an unstructured text and focuses the attention on words representing Named Entities. The proposed solution consists of a Knowledge based algorithm for Named Entity Disambiguation: we used an ad hoc built corpus, extracted form Wikipedia’s articles to prove the soundness of the algorithm
- …