Search CORE

96 research outputs found

RefConcile – automated online reconciliation of bibliographic references

Author: A. Polaszek
D. Defays
D. Geer
G. Sautter
H. Köpcke
H. Köpcke
J. Beall
K. Davies
K.S. Jones
M.A. Jaro
M.A. Jaro
T. Blakely
V.I. Levenshtein
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Comprehensive bibliographies often rely on community contributions. In such a setting, de-duplication is mandatory for the bibliography to be useful. Ideally, it works online, i.e., during the addition of new references, so the bibliography remains duplicate-free at all times. While de-duplication is well researched, generic approaches do not achieve the result quality required for automated reconciliation. To overcome this problem, we propose a new duplicate detection and reconciliation technique called RefConcile. Aimed specifically at bibliographic references, it uses dedicated blocking and matching techniques tailored to this type of data. Our evaluation based on a large real-world collection of bibliographic references shows that RefConcile scales well, and that it detects and reconciles duplicates highly accurately

Crossref

Open Research Online (The Open University)

Flexible and Efficient Distributed Resolution of Large Entities

Author: C.I. Sidló
D. Menestrina
H. Köpcke
H. Köpcke
I. Bhattacharya
I. Bhattacharya
I. Fellegi
J. Dean
L. Getoor
M. Boley
M. Hernández
M. Weis
M. Yakout
O. Benjelloun
P. Christen
S. Guo
S.E. Whang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

SZTAKI Publication Repository

On Feeding Business Systems with Linked Resources from the Web of Data

Author: A Nikolov
A-CN Ngomo
Dmitri V. Kalashnikov
E Jiménez-Ruiz
H Alili
H Köpcke
I Bhattacharya
IF Cruz
M Holub
Mauricio A. Hernández
P Szekely
R Isele
R Isele
Rohit Ananthakrishna
V Rastogi
W Hu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2018
Field of study

Business systems that are fed with data from the Web of Data require transparent interoperability. The Linked Data principles establish that different resources that represent the same real-world entities must be linked for such purpose. Link rules are paramount to transparent interoperability since they produce the links between resources. State-of-the-art link rules are learnt by genetic programming and build on comparing the values of the attributes of the resources. Unfortunately, this approach falls short in cases in which resources have similar values for their attributes, but represent different real-world entities. In this paper, we present a proposal that leverages a genetic programming that learns link rules and an ad-hoc filtering technique that boosts them to decide whether the links that they produce must be selected or not. Our analysis of the literature reveals that our approach is novel and our experimental analysis confirms that it helps improve the F1 score by increasing precision without a significant penalty on recall.Ministerio de Economía y Competitividad TIN2013-40848-RMinisterio de Economía y Competitividad TIN2016- 75394-

Crossref

idUS. Depósito de Investigación Universidad de Sevilla

Matching titles with cross title web-search enrichment and community detection

Author: Blondel V. D.
Brizan D. G.
Broder A.
Buchuk Daniel
Chaudhuri S.
Chaudhuri S.
Dunn H. L.
Fellegi I. P.
Fortunato S.
Girvan M.
Gopalakrishnan V.
Hepp M.
Kannan A.
Köpcke H.
Sarawagi S.
Wang H.
Willinger W.
Zhang W. V.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Digital Footprints:Your Unique Identity

Author: Bernado J Smith A
Fish T
Fox D Hightower J, Kauz H, Liao L, Patterson D
Haimson O Brubaker J, Dombrowski L, Hayes G
Kean T Kojm C, Zelikow P, Thompson J, Gorton S, Roemer T, Gorelick J, Lehman J, Fielding F, F.F., Kerrey B
Köpcke H Rahm E
Li J Wang G, Chen H
Park M
Spalevic Z Ilic M
Vignoles V
Wang G Chen H, Xu J, Atabakhsh H
Weisstein E
Xiang R Neville J, Rogati M
Publication venue
Publication date: 03/07/2018
Field of study

Crossref

Ulster University's Research Portal

A proficient cost reduction framework for de-duplication of records in data integration

Author: AK Elmagarmid
Asif Sohail
Data Integration Manual
E Rahm
F Bauer
F Maggi
H Köpcke
IP Fellegi
J Bleiholder
K Goiser
L Gu
L Gu
L Gu
L Jiang
L Patrick
M Michelson
M Odell
M Samwald
MA Hernandez
MG Elfeky
Muhammad Murtaza Yousaf
P Christen
P Christen
P Giang
R Baxter
S Chaudhuri
S Yan
SE Whang
SE Whang
SM Randall
T Fawcett
U Draisbach
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Partial motivation, multiple motivation: the role of output schemas in morphology

Author: A Jager De
A Loey Van
D Jurafsky
DL Bolinger
G Booij
G Booij
H Gundersen
J Bybee
J Marle Van
K-M Köpcke
M Aronoff
M Aronoff
N Kwon
R Jackendoff
T Weidhaas
W Haas De
Publication venue
Publication date: 31/12/2018
Field of study

Theoretical and Experimental Linguistic

Crossref

Leiden University Scholary Publications

Improving Record Linkage Accuracy with Hierarchical Feature Level Information and Parsed Data

Author: AK Elmagarmid
CP Campos de
CP Campos de
D Heckerman
E Rahm
H Köpcke
HL Dunn
IP Fellegi
J. Mark Bishop
John Howroyd
L Leitão
M Hall
M Tromp
MA Jaro
Minlue Wang
N Friedman
Sebastian Danicic
T Churches
Valeriia Haberland
Y Zhou
Yun Zhou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/01/2017
Field of study

Probabilistic record linkage is a well established topic in the literature. Fellegi-Sunter probabilistic record linkage and its enhanced versions are commonly used methods, which calculate match and non- match weights for each pair of records. Bayesian network classifiers – naive Bayes classifier and TAN have also been successfully used here. Recently, an extended version of TAN (called ETAN) has been developed and proved superior in classification accuracy to conventional TAN. However, no previous work has applied ETAN to record linkage and investigated the benefits of using naturally existing hierarchical feature level information and parsed fields of the datasets. In this work, we ex- tend the naive Bayes classifier with such hierarchical feature level information. Finally we illustrate the benefits of our method over previously proposed methods on 4 datasets in terms of the linkage performance (F1 score). We also show the results can be further improved by evaluating the benefit provided by additionally parsing the fields of these datasets

Goldsmiths Research Online

Crossref

Explore Bristol Research