CORE
CO
nnecting
RE
positories
Services
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Research partnership
About
About
About us
Our mission
Team
Blog
FAQs
Contact us
Community governance
Governance
Advisory Board
Board of supporters
Research network
Innovations
Our research
Labs
An ensemble approach for record matching in data linkage
Authors
KL Chan
WHN Cheung
+9 more
JYL Ching
MK Lam
AY Lau
VCT Mok
J Poon
SK Poon
DMY Sze
JCY Wu
Q Yin
Publication date
1 January 2016
Publisher
'IOS Press'
Doi
Abstract
© 2016 The authors and IOS Press. Objectives: To develop and test an optimal ensemble configuration of two complementary probabilistic data matching techniques namely Fellegi-Sunter (FS) and Jaro-Wrinkler (JW) with the goal of improving record matching accuracy. Methods: Experiments and comparative analyses were carried out to compare matching performance amongst the ensemble configurations combining FS and JW against the two techniques independently. Results: Our results show that an improvement can be achieved when FS technique is applied to the remaining unsure and unmatched records after the JW technique has been applied. Discussion: Whilst all data matching techniques rely on the quality of a diverse set of demographic data, FS technique focuses on the aggregating matching accuracy from a number of useful variables and JW looks closer into matching the data content (spelling in this case) of each field. Hence, these two techniques are shown to be complementary. In addition, the sequence of applying these two techniques is critical. Conclusion: We have demonstrated a useful ensemble approach that has potential to improve data matching accuracy, particularly when the number of demographic variables is limited. This ensemble technique is particularly useful when there are multiple acceptable spellings in the fields, such as names and addresses
Similar works
Full text
Available Versions
OPUS - University of Technology Sydney
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:opus.lib.uts.edu.au:10453/...
Last time updated on 18/10/2019