3 research outputs found

    Optimal assignment problem on record linkage

    Get PDF
    We present an application of the Hungarian Method, an optimal assignment graph theory algorithm, to record linkage in order to improve the disclosure risk assessment. We should note that Hungarian Method has O(n^3) complexity; three different methods are presented to reduce its computational cost

    Essays in Labor and Organization Economics

    Full text link
    The following dissertation is a collection of three independent essays. The first two essays contribute to the literature on Organization Economics. The third essay contributes to the literature on data confidentiality. Essay 1, ?Turnover as a Gateway to Symmetric Information,? explores high-ability turnover in highly competitive labor markets. Why do workers who are successful at a given firm decide to leave? Essay 1 asserts that such movement is driven by the presence of asymmetric information. In particular, it is shown that when competing firms have less knowledge of a worker?s ability than his current firm, there exists an incentive for high-ability workers to leave their current job in pursuit of a higher wage. Such an incentive generates a set of testable predictions. The predictions are tested using the personnel records from the management of a medium-size firm in the US financial services industry. The data is consistent with the theory. Essay 2, ?Piece-Rates, Salary, Performance and Job Level,? explores the effect of monitoring and hierarchy on compensation structure. Previous work has shown that monitoring worker effort is more difficult at lower levels of the hierarchy, and, simultaneously, that compensation should rely more on salary payments than piece-rate payments when effort is more difficult to monitor. Essay 2 formalizes these ideas in a simple model of moral hazard. The model generates a set of predictions about how salary, bonus and performance should vary across levels of the hierarchy. The predictions are tested using the same data as Essay 1 and strong support is found. Essay 3, ?Synthetic Data and Risk of Disclosure,? explores how well synthetic data protects confidential data. Using a unique Census dataset and 4 synthetic implicates, the risk of disclosure is found to be quite small. In a secondary analysis, the effectiveness of distance-based and probabilistic re-identification methods are also explored. Contrary to previous experiments it is found that probabilistic re-identification outperforms distance-based. Further, it appears that the difference in performance is driven by the number of matching variables: as more matching variables are added, the success rate of probabilistic matching increases more quickly
    corecore