3,993 research outputs found

    Assessing the disclosure risk of CTA-like methods

    Get PDF
    Minimum distance controlled tabular adjustment (CTA) is a recent perturbative approach for statistical disclosure control in tabular data. CTA looks for the closest safe table, using some particular distance. In this talk we provide empirical results to assess the disclosure risk of the method. A set of 33 instances from the literature and four different attacker scenarios are considered. The result s show that, unless the attacker has good information about the original table, CTA has low disclosure risk. This talk summarizes results reported in the paper “Castro, J. (2013). On assessing the disclosure risk of controlled adjustment methods for statistical tabular data, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 20, 921–941.Peer ReviewedPostprint (published version

    On assessing the disclosure risk of controlled adjustment methods for statistical tabular data

    Get PDF
    Minimum distance controlled tabular adjustment is a recent perturbative approach for statistical disclosure control in tabular data. Given a table to be protected, it looks for the closest safe table, using some particular distance. Con trolled adjustment is known to provide high data utility. However, the disclosure risk has only been partially analyzed using theoretical results from optimization. This work ext ends these previous results, providing both a more detailed theoretical analysis, and an extensive empirical assess- ment of the disclosure risk of the method. A set of 25 instance s from the literature and four different attacker scenarios are considered, with sever al random replications for each scenario, both for L 1 and L 2 distances. This amounts to the solution of more than 2000 optimization problems. The analysis of the results shows th at the approach has low dis- closure risk when the attacker has no good information on the bounds of the optimization problem. On the other hand, when the attacker has good estima tes of the bounds, and the only uncertainty is in the objective function (which is a very strong assumption), the disclosure risk of controlled adjustment is high and it s hould be avoided.Peer ReviewedPreprin

    A second order cone formulation of continuous CTA model

    Get PDF
    The final publication is available at link.springer.comIn this paper we consider a minimum distance Controlled Tabular Adjustment (CTA) model for statistical disclosure limitation (control) of tabular data. The goal of the CTA model is to find the closest safe table to some original tabular data set that contains sensitive information. The measure of closeness is usually measured using l1 or l2 norm; with each measure having its advantages and disadvantages. Recently, in [4] a regularization of the l1 -CTA using Pseudo-Huber func- tion was introduced in an attempt to combine positive characteristics of both l1 -CTA and l2 -CTA. All three models can be solved using appro- priate versions of Interior-Point Methods (IPM). It is known that IPM in general works better on well structured problems such as conic op- timization problems, thus, reformulation of these CTA models as conic optimization problem may be advantageous. We present reformulation of Pseudo-Huber-CTA, and l1 -CTA as Second-Order Cone (SOC) op- timization problems and test the validity of the approach on the small example of two-dimensional tabular data set.Peer ReviewedPostprint (author's final draft

    A linear optimization based method for data privacy in statistical tabular data

    Get PDF
    National Statistical Agencies routinely disseminate large amounts of data. Prior to dissemination these data have to be protected to avoid releasing confidential information. Controlled tabular adjustment (CTA) is one of the available methods for this purpose. CTA formulates an optimization problem that looks for the safe table which is closest to the original one. The standard CTA approach results in a mixed integer linear optimization (MILO) problem, which is very challenging for current technology. In this work we present a much less costly variant of CTA that formulates a multiobjective linear optimization (LO) problem, where binary variables are pre-fixed, and the resulting continuous problem is solved by lexicographic optimization. Extensive computational results are reported using both commercial (CPLEX and XPRESS) and open source (Clp) solvers, with either simplex or interior-point methods, on a set of real instances. Most instances were successfully solved with the LO-CTA variant in less than one hour, while many of them are computationally very expensive with the MILO-CTA formulation. The interior-point method outperformed simplex in this particular application.Peer ReviewedPreprin

    Geographically intelligent disclosure control for flexible aggregation of census data

    No full text
    This paper describes a geographically intelligent approach to disclosure control for protecting flexibly aggregated census data. Increased analytical power has stimulated user demand for more detailed information for smaller geographical areas and customized boundaries. Consequently it is vital that improved methods of statistical disclosure control are developed to protect against the increased disclosure risk. Traditionally methods of statistical disclosure control have been aspatial in nature. Here we present a geographically intelligent approach that takes into account the spatial distribution of risk. We describe empirical work illustrating how the flexibility of this new method, called local density swapping, is an improved alternative to random record swapping in terms of risk-utility

    Thirty years of optimization-based SDC methods for tabular data

    Get PDF
    In 1966 Bacharach published in Management Science a work on matrix rounding problems in two-way tables of economic statistics, formulated as a network optimization problem. This is likely the first application of optimization/operations research for statistical disclosure control (SDC) in tabular data. Years later, in 1982, Cox and Ernst used the same approach in a work in INFOR for a similar problem: controlled rounding. And thirty years ago, in 1992, a paper by Kelly, Golden and Assad appeared in Networks about the solution of the cell suppression problem, also using network optimization. Cell suppression was used for years as the main SDC technique for tabular data, and it was an active field of research which resulted in several lines of work and many publications. The above are some of the seminal works on the use of optimization methods for SDC when releasing tabular data. This paper discusses some of the research done this field since then, with a focus on the approaches that were of practical use. It also discusses their pros and cons compared to recent techniques that are not based on optimization methods.Peer ReviewedPostprint (published version

    Optimization Methods for Tabular Data Protection

    Get PDF
    In this thesis we consider a minimum distance Controlled Tabular Adjustment (CTA) model for statistical disclosure limitation (control) of tabular data. The goal of the CTA model is to find the closest safe table to some original tabular data set that contains sensitive information. The measure of closeness is usually measured using l1 or l2 norm; with each measure having its advantages and disadvantages. According to the given norm CTA can be formulated as an optimization problem: Liner Programing (LP) for l1, Quadratic Programing (QP) for l2. In this thesis we present an alternative reformulation of l1-CTA as Second-Order Cone (SOC) optimization problems. All three models can be solved using appropriate versions of Interior-Point Methods (IPM). The validity of the new approach was tested on the randomly generated two-dimensional tabular data sets. It was shown numerically, that SOC formulation compares favorably to QP and LP formulations

    Priv Stat Databases

    Get PDF
    In this paper, we consider a Controlled Tabular Adjustment (CTA) model for statistical disclosure limitation of tabular data. The goal of the CTA model is to find the closest safe (masked) table to the original table that contains sensitive information. The measure of closeness is usually measured using | | or | | norm. However, in the norm-based CTA model, there is no control of how well the statistical properties of the data in the original table are preserved in the masked table. Hence, we propose a different criterion of "closeness" between the masked and original table which attempts to minimally change certain statistics used in the analysis of the table. The Chi-square statistic is among the most utilized measures for the analysis of data in two-dimensional tables. Hence, we propose a | CTA model which minimizes the objective function that depends on the difference of the Chi-square statistics of the original and masked table. The model is non-linear and non-convex and therefore harder to solve which prompted us to also consider a modification of this model which can be transformed into a linear programming model that can be solved more efficiently. We present numerical results for the two-dimensional table illustrating our novel approach and providing a comparison with norm-based CTA models.CC999999/ImCDC/Intramural CDC HHSUnited States/2021-01-01T00:00:00Z33889869PMC80573071023
    • …
    corecore