1 research outputs found
Ignoring Dependency between Linking Variables and Its Impact on the Outcome of Probabilistic Record Linkage Studies
A b s t r a c t Objectives: This study sought to examine the differences between ignoring (naïve) and incorporating dependency (nonnaïve) among linkage variables on the outcome of a probabilistic record linkage study. Design and Measurements: We used the outcomes of a previously developed probabilistic linkage procedure for different registries in perinatal care assuming independence among linkage variables. We estimated the impact of ignoring dependency by re-estimating the linkage weights after constructing a variable that combines the outcomes of the comparison of 2 correlated linking variables. The results of the original naïve and the new nonnaïve strategy were systematically compared for 3 scenarios: the empirical dataset using 9 variables, the empirical dataset using 5 variables, and a simulated dataset using 5 variables. Results: The linking weight for agreement on 2 correlated variables among nonmatches was estimated considerably higher in the naïve strategy than in the nonnaïve strategy (16.87 vs. 13.55). Therefore, ignoring dependency overestimates the amount of identifying information if both correlated variables agree. The impact on the number of pairs that was classified differently with both approaches was modest in the situation in which there were many different linking variables but grew substantially with fewer variables. The simulation study confirmed the results of the empirical study and suggests that the number of misclassifications can increase substantially by ignoring dependency under less favorable linking conditions. Conclusion: Dependency often exists between linking variables and has the potential to bias the outcome of a linkage study. The nonnaïve approach is a straightforward method for creating linking weights that accommodate dependency. The impact on the number of misclassifications depends on the quality and number of linking variables relative to the number of correlated linking variables. Ⅲ J Am Med Inform Assoc. 2008;15:654 -660