Search CORE

49,410 research outputs found

MinoanER: Schema-Agnostic, Non-Iterative, Massively Parallel Resolution of Web Entities

Author: Christophides Vassilis
Efthymiou Vasilis
Papadakis George
Stefanidis Kostas
Publication venue
Publication date: 15/05/2019
Field of study

Entity Resolution (ER) aims to identify different descriptions in various Knowledge Bases (KBs) that refer to the same entity. ER is challenged by the Variety, Volume and Veracity of entity descriptions published in the Web of Data. To address them, we propose the MinoanER framework that simultaneously fulfills full automation, support of highly heterogeneous entities, and massive parallelization of the ER process. MinoanER leverages a token-based similarity of entities to define a new metric that derives the similarity of neighboring entities from the most important relations, as they are indicated only by statistics. A composite blocking method is employed to capture different sources of matching evidence from the content, neighbors, or names of entities. The search space of candidate pairs for comparison is compactly abstracted by a novel disjunctive blocking graph and processed by a non-iterative, massively parallel matching algorithm that consists of four generic, schema-agnostic matching rules that are quite robust with respect to their internal configuration. We demonstrate that the effectiveness of MinoanER is comparable to existing ER tools over real KBs exhibiting low Variety, but it outperforms them significantly when matching KBs with high Variety.Comment: Presented at EDBT 2001

arXiv.org e-Print Archive

TamPub Julkaisuarkisto - TamPub Institutional Repository

Trepo - Institutional Repository of Tampere University

Hybrid Similarity Function for Big Data Entity Matching with R-Swoosh

Author: Gorijala Vimal Chandra
Publication venue: SJSU ScholarWorks
Publication date: 26/05/2016
Field of study

Entity Matching (EM) is the problem of determining if two entities in a data set refer to the same real-world object. For example, it decides if two given mentions in the data, such as “Helen Hunt” and “H. M. Hunt”, refer to the same real-world entity by using different similarity functions. This problem plays a key role in information integration, natural language understanding, information processing on the World-Wide Web, and on the emerging Semantic Web. This project deals with the similarity functions and thresholds utilized in them to determine the similarity of the entities. The work contains two major parts: implementation of a hybrid similarity function, which contains three different similarity functions to determine the similarity of entities, and an efficient method to determine the optimum threshold value for similarity functions to get accurate results

SJSU ScholarWorks

Deep Joint Entity Disambiguation with Local Neural Attention

Author: Ganea Octavian-Eugen
Hofmann Thomas
Publication venue
Publication date: 31/07/2017
Field of study

We propose a novel deep learning model for joint document-level entity disambiguation, which leverages learned neural representations. Key components are entity embeddings, a neural attention mechanism over local context windows, and a differentiable joint inference stage for disambiguation. Our approach thereby combines benefits of deep learning with more traditional approaches such as graphical models and probabilistic mention-entity maps. Extensive experiments show that we are able to obtain competitive or state-of-the-art accuracy at moderate computational costs.Comment: Conference on Empirical Methods in Natural Language Processing (EMNLP) 2017 long pape

arXiv.org e-Print Archive

Repository for Publications and Research Data

End-to-end Neural Coreference Resolution

Author: He Luheng
Lee Kenton
Lewis Mike
Zettlemoyer Luke
Publication venue
Publication date: 01/01/2017
Field of study

We introduce the first end-to-end coreference resolution model and show that it significantly outperforms all previous work without using a syntactic parser or hand-engineered mention detector. The key idea is to directly consider all spans in a document as potential mentions and learn distributions over possible antecedents for each. The model computes span embeddings that combine context-dependent boundary representations with a head-finding attention mechanism. It is trained to maximize the marginal likelihood of gold antecedent spans from coreference clusters and is factored to enable aggressive pruning of potential mentions. Experiments demonstrate state-of-the-art performance, with a gain of 1.5 F1 on the OntoNotes benchmark and by 3.1 F1 using a 5-model ensemble, despite the fact that this is the first approach to be successfully trained with no external resources.Comment: Accepted to EMNLP 201

arXiv.org e-Print Archive

Crossref

Teaching Machines to Read and Comprehend

Author: Blunsom Phil
Espeholt Lasse
Grefenstette Edward
Hermann Karl Moritz
Kay Will
Kočiský Tomáš
Suleyman Mustafa
Publication venue
Publication date: 19/11/2015
Field of study

Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.Comment: Appears in: Advances in Neural Information Processing Systems 28 (NIPS 2015). 14 pages, 13 figure

arXiv.org e-Print Archive

Oxford University Research Archive