2,579 research outputs found

    Geographically intelligent disclosure control for flexible aggregation of census data

    No full text
    This paper describes a geographically intelligent approach to disclosure control for protecting flexibly aggregated census data. Increased analytical power has stimulated user demand for more detailed information for smaller geographical areas and customized boundaries. Consequently it is vital that improved methods of statistical disclosure control are developed to protect against the increased disclosure risk. Traditionally methods of statistical disclosure control have been aspatial in nature. Here we present a geographically intelligent approach that takes into account the spatial distribution of risk. We describe empirical work illustrating how the flexibility of this new method, called local density swapping, is an improved alternative to random record swapping in terms of risk-utility

    A second order cone formulation of continuous CTA model

    Get PDF
    The final publication is available at link.springer.comIn this paper we consider a minimum distance Controlled Tabular Adjustment (CTA) model for statistical disclosure limitation (control) of tabular data. The goal of the CTA model is to find the closest safe table to some original tabular data set that contains sensitive information. The measure of closeness is usually measured using l1 or l2 norm; with each measure having its advantages and disadvantages. Recently, in [4] a regularization of the l1 -CTA using Pseudo-Huber func- tion was introduced in an attempt to combine positive characteristics of both l1 -CTA and l2 -CTA. All three models can be solved using appro- priate versions of Interior-Point Methods (IPM). It is known that IPM in general works better on well structured problems such as conic op- timization problems, thus, reformulation of these CTA models as conic optimization problem may be advantageous. We present reformulation of Pseudo-Huber-CTA, and l1 -CTA as Second-Order Cone (SOC) op- timization problems and test the validity of the approach on the small example of two-dimensional tabular data set.Peer ReviewedPostprint (author's final draft

    Thirty years of optimization-based SDC methods for tabular data

    Get PDF
    In 1966 Bacharach published in Management Science a work on matrix rounding problems in two-way tables of economic statistics, formulated as a network optimization problem. This is likely the first application of optimization/operations research for statistical disclosure control (SDC) in tabular data. Years later, in 1982, Cox and Ernst used the same approach in a work in INFOR for a similar problem: controlled rounding. And thirty years ago, in 1992, a paper by Kelly, Golden and Assad appeared in Networks about the solution of the cell suppression problem, also using network optimization. Cell suppression was used for years as the main SDC technique for tabular data, and it was an active field of research which resulted in several lines of work and many publications. The above are some of the seminal works on the use of optimization methods for SDC when releasing tabular data. This paper discusses some of the research done this field since then, with a focus on the approaches that were of practical use. It also discusses their pros and cons compared to recent techniques that are not based on optimization methods.Peer ReviewedPostprint (published version

    A linear optimization based method for data privacy in statistical tabular data

    Get PDF
    National Statistical Agencies routinely disseminate large amounts of data. Prior to dissemination these data have to be protected to avoid releasing confidential information. Controlled tabular adjustment (CTA) is one of the available methods for this purpose. CTA formulates an optimization problem that looks for the safe table which is closest to the original one. The standard CTA approach results in a mixed integer linear optimization (MILO) problem, which is very challenging for current technology. In this work we present a much less costly variant of CTA that formulates a multiobjective linear optimization (LO) problem, where binary variables are pre-fixed, and the resulting continuous problem is solved by lexicographic optimization. Extensive computational results are reported using both commercial (CPLEX and XPRESS) and open source (Clp) solvers, with either simplex or interior-point methods, on a set of real instances. Most instances were successfully solved with the LO-CTA variant in less than one hour, while many of them are computationally very expensive with the MILO-CTA formulation. The interior-point method outperformed simplex in this particular application.Peer ReviewedPreprin

    Revisiting interval protection, a.k.a. partial cell suppression, for tabular data

    Get PDF
    The final publication is available at link.springer.comInterval protection or partial cell suppression was introduced in “M. Fischetti, J.-J. Salazar, Partial cell suppression: A new methodology for statistical disclosure control, Statistics and Computing, 13, 13–21, 2003” as a “linearization” of the difficult cell suppression problem. Interval protection replaces some cells by intervals containing the original cell value, unlike in cell suppression where the values are suppressed. Although the resulting optimization problem is still huge—as in cell suppression, it is linear, thus allowing the application of efficient procedures. In this work we present preliminary results with a prototype implementation of Benders decomposition for interval protection. Although the above seminal publication about partial cell suppression applied a similar methodology, our approach differs in two aspects: (i) the boundaries of the intervals are completely independent in our implementation, whereas the one of 2003 solved a simpler variant where boundaries must satisfy a certain ratio; (ii) our prototype is applied to a set of seven general and hierarchical tables, whereas only three two-dimensional tables were solved with the implementation of 2003.Peer ReviewedPostprint (author's final draft

    Testing variants of minimum distance controlled tabular adjustment

    Get PDF
    Controlled tabular adjustment (CTA), and its minimum distance variants, is a recent methodology for the protection of tabular data. Given a table to be protected, the purpose of the method is to fi nd the closest one that guarantees the confi dentiality of the sensitive cells. This is achieved by adding slight adjustments to the remaining cells, preferably excluding total ones, whose values are preserved. Unlike other approaches, this methodology can effi ciently protect large tables of any number of dimensions and structure. In this work, we test some minimum distance variants of CTA on a close-to-real data set, and analyze the quality of the solutions provided. As another alternative, we suggest a restricted CTA (RCTA) approach, where adjustments are only allowed in a subset of cells. This subset is a priori computed, for instance by a fast heuristic for the cell suppression problem. We discuss benefi ts of RCTA, and suggest several approaches for its solution.Postprint (published version

    Optimization Methods for Tabular Data Protection

    Get PDF
    In this thesis we consider a minimum distance Controlled Tabular Adjustment (CTA) model for statistical disclosure limitation (control) of tabular data. The goal of the CTA model is to find the closest safe table to some original tabular data set that contains sensitive information. The measure of closeness is usually measured using l1 or l2 norm; with each measure having its advantages and disadvantages. According to the given norm CTA can be formulated as an optimization problem: Liner Programing (LP) for l1, Quadratic Programing (QP) for l2. In this thesis we present an alternative reformulation of l1-CTA as Second-Order Cone (SOC) optimization problems. All three models can be solved using appropriate versions of Interior-Point Methods (IPM). The validity of the new approach was tested on the randomly generated two-dimensional tabular data sets. It was shown numerically, that SOC formulation compares favorably to QP and LP formulations

    Eliminating small cells from census counts tables: empirical vs. design transition probabilities

    Get PDF
    The software SAFE has been developed at the State Statistical Institute Berlin-Brandenburg and has been in regular use there for several years now. It involves an algorithm that yields a controlled cell frequency perturbation. When a microdata set has been protected by this method, any table which can be computed on the basis of this microdata set will not contain any small cells, e.g. cells with frequency counts 1 or 2. We compare empirically observed transition probabilities resulting from this pre-tabular method to transition matrices in the context of variants of microdata key based post-tabular random perturbation methods suggested in the literature, e.g. Shlomo, N., Young, C. (2008) and Fraser, B.,Wooton, J. (2006)

    Eliminating small cells from census counts tables: empirical vs. design transition probabilities

    Get PDF
    The software SAFE has been developed at the State Statistical Institute Berlin-Brandenburg and has been in regular use there for several years now. It involves an algorithm that yields a controlled cell frequency perturbation. When a microdata set has been protected by this method, any table which can be computed on the basis of this microdata set will not contain any small cells, e.g. cells with frequency counts 1 or 2. We compare empirically observed transition probabilities resulting from this pre-tabular method to transition matrices in the context of variants of microdata key based post-tabular random perturbation methods suggested in the literature, e.g. Shlomo, N., Young, C. (2008) and Fraser, B.,Wooton, J. (2006)Peer Reviewe
    corecore