1,600 research outputs found

    p-probabilistic k-anonymous microaggregation for the anonymization of surveys with uncertain participation

    Get PDF
    We develop a probabilistic variant of k-anonymous microaggregation which we term p-probabilistic resorting to a statistical model of respondent participation in order to aggregate quasi-identifiers in such a manner that k-anonymity is concordantly enforced with a parametric probabilistic guarantee. Succinctly owing the possibility that some respondents may not finally participate, sufficiently larger cells are created striving to satisfy k-anonymity with probability at least p. The microaggregation function is designed before the respondents submit their confidential data. More precisely, a specification of the function is sent to them which they may verify and apply to their quasi-identifying demographic variables prior to submitting the microaggregated data along with the confidential attributes to an authorized repository. We propose a number of metrics to assess the performance of our probabilistic approach in terms of anonymity and distortion which we proceed to investigate theoretically in depth and empirically with synthetic and standardized data. We stress that in addition to constituting a functional extension of traditional microaggregation, thereby broadening its applicability to the anonymization of statistical databases in a wide variety of contexts, the relaxation of trust assumptions is arguably expected to have a considerable impact on user acceptance and ultimately on data utility through mere availability.Peer ReviewedPostprint (author's final draft

    Consistent Estimation of a Simple Linear Model Under Microaggregation

    Get PDF
    A problem statistical offices are increasingly faced with is guaranteeing confidentiality when releasing microdata sets. One method to provide safe microdata to is to reduce the information content of a data set by means of masking procedures. A widely discussed masking procedure is microaggregation, a technique where observations are grouped and replaced with their corresponding group means. However, while reducing the disclosure risk of a data file, microaggregation also affects the results of statistical analyses. The paper deals with the impact of microaggregation on a simple linear model. We show that parameter estimates are biased if the dependent variable is used to group the data. It turns out that the bias of the slope parameter estimate is a non-monotonic function of this parameter. By means of this non-monotonic relationship we develop a method for consistently estimating the model parameters

    Statistical Inference in a Simple Linear Model Under Microaggregation

    Get PDF
    A problem statistical offices are increasingly faced with is guaranteeing confidentiality when releasing microdata sets. One method to provide safe microdata is to reduce the information content of a data set by means of masking procedures. A widely discussed masking procedure is microaggregation, a technique where observations are grouped and replaced with their corresponding group means. However, while reducing the disclosure risk of a data file, microaggregation also affects the results of statistical analyses. We focus on the effect of microaggregation on a simple linear model. In a previous paper we have shown how to correct for the aggregation bias of the naive least-squares estimator that occurs when the dependent variable is used to group the data. The present paper deals with the asymptotic variance of the corrected least-squares estimator and with the asymptotic variance of the naive least-squares estimator when either the dependent variable or the regressor is used to group the data. We derive asymptotic confidence intervals for the slope parameter. Furthermore, we show how to test for the significance of the slope parameter by analyzing the effect of microaggregation on the asymptotic power function of the naive t-test

    The Effect of Microaggregation Procedures on the Estimation of Linear Models: A Simulation Study

    Get PDF
    Microaggregation is a set of procedures that distort empirical data in order to guarantee the factual anonymity of the data. At the same time the information content of data sets should not be reduced too much and should still be useful for scientific research. This paper investigates the effect of microaggregation on the estimation of a linear regression by ordinary least squares. It studies, by way of an extensive simulation experiment, the bias of the slope parameter estimator induced by various microaggregation techniques. Some microaggregation procedures lead to consistent estimates while others imply an asymptotic bias for the estimator

    The effect of microaggregation on regression results: an application to Spanish innovation data

    Get PDF
    Microaggregation is a technique for masking confidential data by aggregation. The aim of this paper is to analyze the extent to which microaggregated data can be used for rigorous empirical research. In doing this, I adopt an empirical perspective. I use data from the Technological Innovation Panel (PITEC) and compare regression results using both original and anonymized data. PITEC is a new firm-level panel data base for innovative activities of Spanish firms based on CIS data. I find that the microaggregation procedure used has a slight effect on the coefficient estimates and their estimated standard errors, especially when estimating linear models.Microaggregation; Individual ranking; Bias; Innovation data

    Estimation of a Linear Regression under Microaggregation with the Response Variable as a Sorting Variable

    Get PDF
    Microaggregation is one of the most frequently applied statistical disclosure control techniques for continuous data. The basic principle of microaggregation is to group the observations in a data set and to replace them by their corresponding group means. However, while reducing the disclosure risk of data files, the technique also affects the results of statistical analyses. The paper deals with the impact of microaggregation on a linear model in continuous variables. We show that parameter estimates are biased if the dependent variable is used to form the groups. Using this result, we develop a consistent estimator that removes the aggregation bias. Moreover, we derive the asymptotic covariance matrix of the corrected least squares estimator
    corecore