4,656 research outputs found

    Imprecise Imputation: A Nonparametric Micro Approach Reflecting the Natural Uncertainty of Statistical Matching with Categorical Data

    Get PDF
    We develop the first statistical matching micro approach reflecting the natural uncer- tainty arising during the integration of categorical data. A complete synthetic file is obtained by imprecise imputation, replacing missing entries by sets of suitable values. We discuss three imprecise imputation strategies and raise ideas on potential refine- ments by logical constraints or likelihood-based arguments. Additionally, we show how imprecise imputation can be embedded into the theory of finite random sets, providing tight lower and upper bounds for parameters. Our simulation results corroborate that their narrowness is practically relevant and that they almost always cover the true parameters

    Imprecise Imputation: A Nonparametric Micro Approach Reflecting the Natural Uncertainty of Statistical Matching with Categorical Data

    Get PDF
    We develop the first statistical matching micro approach reflecting the natural uncer- tainty arising during the integration of categorical data. A complete synthetic file is obtained by imprecise imputation, replacing missing entries by sets of suitable values. We discuss three imprecise imputation strategies and raise ideas on potential refine- ments by logical constraints or likelihood-based arguments. Additionally, we show how imprecise imputation can be embedded into the theory of finite random sets, providing tight lower and upper bounds for parameters. Our simulation results corroborate that their narrowness is practically relevant and that they almost always cover the true parameters

    Marginal Release Under Local Differential Privacy

    Full text link
    Many analysis and machine learning tasks require the availability of marginal statistics on multidimensional datasets while providing strong privacy guarantees for the data subjects. Applications for these statistics range from finding correlations in the data to fitting sophisticated prediction models. In this paper, we provide a set of algorithms for materializing marginal statistics under the strong model of local differential privacy. We prove the first tight theoretical bounds on the accuracy of marginals compiled under each approach, perform empirical evaluation to confirm these bounds, and evaluate them for tasks such as modeling and correlation testing. Our results show that releasing information based on (local) Fourier transformations of the input is preferable to alternatives based directly on (local) marginals

    Eliminating small cells from census counts tables: empirical vs. design transition probabilities

    Get PDF
    The software SAFE has been developed at the State Statistical Institute Berlin-Brandenburg and has been in regular use there for several years now. It involves an algorithm that yields a controlled cell frequency perturbation. When a microdata set has been protected by this method, any table which can be computed on the basis of this microdata set will not contain any small cells, e.g. cells with frequency counts 1 or 2. We compare empirically observed transition probabilities resulting from this pre-tabular method to transition matrices in the context of variants of microdata key based post-tabular random perturbation methods suggested in the literature, e.g. Shlomo, N., Young, C. (2008) and Fraser, B.,Wooton, J. (2006)
    • …
    corecore