Search CORE

8 research outputs found

SiGMa: Simple Greedy Matching for Aligning Large Knowledge Bases

Author: Davies A
Ghahramani Z
Graepel T
Kasneci G
Lacoste-Julien S
Palla K
Publication venue
Publication date: 01/01/2012
Field of study

The Internet has enabled the creation of a growing number of large-scale knowledge bases in a variety of domains containing complementary information. Tools for automatically aligning these knowledge bases would make it possible to unify many sources of structured knowledge and answer complex queries. However, the efficient alignment of large-scale knowledge bases still poses a considerable challenge. Here, we present Simple Greedy Matching (SiGMa), a simple algorithm for aligning knowledge bases with millions of entities and facts. SiGMa is an iterative propagation algorithm which leverages both the structural information from the relationship graph as well as flexible similarity measures between entity properties in a greedy local search, thus making it scalable. Despite its greedy nature, our experiments indicate that SiGMa can efficiently match some of the world's largest knowledge bases with high precision. We provide additional experiments on benchmark datasets which demonstrate that SiGMa can outperform state-of-the-art approaches both in accuracy and efficiency.Comment: 10 pages + 2 pages appendix; 5 figures -- initial preprin

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

UCL Discovery

CUED - Cambridge University Engineering Department

Searching by approximate personal-name matching

Author: Camps Pare Rafael
Daude Ventura Jordi
Publication venue
Publication date: 01/01/2003
Field of study

We discuss the design, building and evaluation of a method to access theinformation of a person, using his name as a search key, even if it has deformations. We present a similarity function, the DEA function, based on the probabilities of the edit operations accordingly to the involved letters and their position, and using a variable threshold. The efficacy of DEA is quantitatively evaluated, without human relevance judgments, very superior to the efficacy of known methods. A very efficient approximate search technique for the DEA function is also presented based on a compacted trie-tree structure.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Capturing place semantics on the GeoSocial web

Author: Abdelmoty Alia
ElGindy Ehab
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/02/2014
Field of study

Crossref

Online Research @ Cardiff

Uso de multi termos em pesquisa textual jurídica

Author: Silva Sidnei Roberto Feliciano da
Publication venue: Florianópolis, SC
Publication date: 01/01/2001
Field of study

Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Ciência da Computação.A pesquisa com multi termos auxilia o processo de busca em bases de dados textuais ao combinar as palavras existentes em cada documento e produzindo um índice classificado pela freqüência de ocorrência de cada um dos termos gerados. A utilização de multi termos na pesquisa jurídica demonstra ser de grande eficiência na aplicação da metodologia. É aferido na pesquisa que o uso de multi termos oferece uma quantidade menor de documentos retornados da pesquisa, com um maior nível de qualidade. A geração de índices de pesquisa é otimizada com a exclusão de palavras de alta ou baixa freqüência, bem como com a limitação na geração da quantidade de palavras que formarão cada termo

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositório Institucional da UFSC

RCAAP - Repositório Científico de Acesso Aberto de Portugal

A study on Analysis and Utilization of Crowd-sourced Spatio-temporal Contexts from Social Media

Author: 若宮翔子
Publication venue
Publication date: 07/02/2014
Field of study

兵庫県立大学大学院201

University of Hyogo Academic Repository / 兵庫県立大学学術情報リポジトリ

Constructing Features and Pseudo-intersections to Map Unreliable Domain Specific Data Items Found in Disjoint Sets

Author: Gaston Billy D'Angelo
Publication venue: 'Oklahoma State University Library'
Publication date: 01/05/2007
Field of study

This research studies the problem of identifying related tuples from two disjoint sets A and B of tuples of aircraft part data. The tuples in set B are defined as unique classifications or candidates to which tuples from set A map. The mapping studied is a many-to-one mapping. A context free grammar (CFG) based on a subset of the data tuples being processed is used to construct relevant features from a single attribute field within the tuples. The notion of discovery items is introduced to assist in feature construction. Once constructed, features are assigned weight values. A sum-ordering feature weighting approach to systematically compute weight values corresponding to analyst-defined ranks and constraints is presented. A series of record comparisons is conducted and an Object Translation Score (OTS), based on weight values, is computed with each comparison. The OTS is a quality of match score. Record Objects and the OTS are introduced to establish a method of quantifying the relationships thus providing a mathematical means to measure and validate relationships. To boost a tuple's probability of registering an optimal OTS, learned data as well as checkpoint data is introduced. These data items are denoted as Enhancement data. Findings and Conclusions: A new algorithm was introduced and compared to the popular EM-based probabilistic record linkage algorithm. The new algorithm outperformed the EM-based algorithm; however it made some incorrect mappings as a result of poorly cleaned data, incorrectly classified terms and the use of an inefficient string comparison model. One difference between our approach and most traditional approaches is that each feature contained multiple values whereas in traditional record linkage solutions, there is normally a single value associated with each feature. Our approach creates features from one record field; in this case the part description field. In addition no training data was needed and external data was used to make optimal record mappings.Computer Science Departmen

SHAREOK repository

Applications of Approximate Word Matching in Information Retrieval

Author: Allison L. Powell
Eric Schulman
James C. French
Publication venue
Publication date: 01/01/1997
Field of study

As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. The need to discover and reconcile variant forms of strings in bibliographic entries, i.e., authority work, will become more critical in the future. Spelling variants, misspellings, and transllteration differences will all increase the difficulty of retrieving information. Approximate string matching has traditionally been used to help with this problem. In this paper we introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms

CiteSeerX

Crossref