Identifying data set specific duplicate patient records

Abstract

posterProbabilistic models are commonly used in the identification of duplicate records. These methods are usually more accurate than deterministic methods, but are exponentially more computationally complex. Thus to make them computationally feasible, they rely on deterministic blocking strategies. This project investigates how machine learning methods can be used to automatically determine an optimal blocking strategy using duplicate records already identified

    Similar works