20 research outputs found

    Rounding methods for protecting EU-aggregates

    Get PDF
    In the European Statistical System the statistical information is collected by the National Statistical Institutes (NSIs). The NSIs produce aggregate tables at the national level. They are also responsible for proper protection of these tables and hence they have to keep certain cells confidential, suppressing them from publications. Eurostat produces statistical information at the EU-level. However, the national suppressions hamper very much the publication of EU-aggregates although it is often only a few smaller countries having to keep their contribution to the EU-total confidential. This paper reports on a research-project that aims for making more EU aggregates available whilst at the same time guaranteeing the national suppressed figures to remain confidential.Postprint (published version

    A French Anonymization Experiment with Health Data

    No full text
    International audienceIn this paper, a case study about a microdata anonymization test is presented. The work has been made considering a French administrative health dataset with indirect identifiers and sensitive variables about hospital stays. Two approaches to build a k-anonymized file are described, and software used in the test are compared

    A linear optimization based method for data privacy in statistical tabular data

    Get PDF
    National Statistical Agencies routinely disseminate large amounts of data. Prior to dissemination these data have to be protected to avoid releasing confidential information. Controlled tabular adjustment (CTA) is one of the available methods for this purpose. CTA formulates an optimization problem that looks for the safe table which is closest to the original one. The standard CTA approach results in a mixed integer linear optimization (MILO) problem, which is very challenging for current technology. In this work we present a much less costly variant of CTA that formulates a multiobjective linear optimization (LO) problem, where binary variables are pre-fixed, and the resulting continuous problem is solved by lexicographic optimization. Extensive computational results are reported using both commercial (CPLEX and XPRESS) and open source (Clp) solvers, with either simplex or interior-point methods, on a set of real instances. Most instances were successfully solved with the LO-CTA variant in less than one hour, while many of them are computationally very expensive with the MILO-CTA formulation. The interior-point method outperformed simplex in this particular application.Peer ReviewedPreprin

    Statistical disclosure control in tabular data

    Get PDF
    Data disseminated by National Statistical Agencies (NSAs) can be classified as either microdata or tabular data. Tabular data is obtained from microdata by crossing one or more categorical variables. Although cell tables provide aggregated information, they also need to be protected. This chapter is a short introduction to tabular data protection. It contains three main sections. The first one shows the different types of tables that can be obtained, and how they are modeled. The second describes the practical rules for detection of sensitive cells that are used by NSAs. Finally, an overview of protection methods is provided, with a particular focus on two of them: “cell suppression problem” and “controlled tabular adjustment”.Postprint (published version

    An ethical framework for sharing patient data without consent

    Get PDF
    Background There is no consensus on how to share patient records privately. Data privacy concepts are surveyed and a framework is presented for the safe sharing of sensitive data. It is argued that tailoring the data sharing to the privacy breach risks of each project holds out the best compromise for keeping the trust of the public and providing for the best quality data where detailed patient consent is not possible. Objective To improve the protection of data by reducing privacy breaches and thus enable appropriate patient data sharing without consent. Framework Any harm arising from data sharing must come from the data being identified, either fully or partially. The first step is an agreement on an acceptable privacy breach risk. Next, proceed to measure that risk for the proposed data when held by a given recipient. Finally, select from a menu of mitigation strategies (people, process and technical) to achieve acceptable risk. The framework is tested against the current UK approach administered by the Patient Information Advisory Group. Discussion The hard problem of non-consented data sharing should be divided into the easier (though non-trivial) ones of data and recipient breach risk measurement. Directed research in these two areas will help move the data sharing problem into the 'solved' pile

    On Utilizing Association and Interaction Concepts for Enhancing Microaggregation in Secure Statistical Databases

    Get PDF
    This paper presents a possibly pioneering endeavor to tackle the microaggregation techniques (MATs) in secure statistical databases by resorting to the principles of associative neural networks (NNs). The prior art has improved the available solutions to the MAT by incorporating proximity information, and this approach is done by recursively reducing the size of the data set by excluding points that are farthest from the centroid and points that are closest to these farthest points. Thus, although the method is extremely effective, arguably, it uses only the proximity information while ignoring the mutual interaction between the records. In this paper, we argue that interrecord relationships can be quantified in terms of the following two entities: 1) their ldquoassociationrdquo and 2) their ldquointeraction.rdquo This case means that records that are not necessarily close to each other may still be ldquogrouped,rdquo because their mutual interaction, which is quantified by invoking transitive-closure-like operations on the latter entity, could be significant, as suggested by the theoretically sound principles of NNs. By repeatedly invoking the interrecord associations and interactions, the records are grouped into sizes of cardinality ldquok,rdquo where k is the security parameter in the algorithm. Our experimental results, which are done on artificial data and benchmark real-life data sets, demonstrate that the newly proposed method is superior to the state of the art not only based on the information loss (IL) perspective but also when it concerns a criterion that involves a combination of the IL and the disclosure risk (DR)

    An ethical framework for sharing patient data without consent

    Full text link

    Identificació i protecció de dades tabulars: el cas de l'estadística sobre pensions de l'INSS

    Get PDF
    Departament d'Econometria, Estadística i Economia Espanyola (Universitat de Barcelona)Aplicació del control de la revelació estadística a partir de les dades de pensions contributives de la Seguretat Social. Resolució del problema de supressió de cel·les en dades tabulars amb el software estadístic R a través del paquet sdcTable.. Es tracta de generar automatismes i criteris de control de la revelació estadística que permetin protegir eficaçment les dades tabulars d'una explotació sistemàtica de la informació que contenen els registres de Seguretat Social. Mitjançant l'ús de paquet R, es crearien les rutines per a la identificació de cel·les no segures i la corresponent protecció (via recodificació i altres tècniques) i es procediria a integrar els càlculs al paquet estadístic SPSS mitjançant R Integration Package for IBM Statistics

    Exact and heuristic methods for statistical tabular data protection

    Get PDF
    One of the main purposes of National Statistical Agencies (NSAs) is to provide citizens or researchers with a large amount of trustful and high quality statistical information. NSAs must guarantee that no confidential individual information can be obtained from the released statistical outputs. The discipline of Statistical disclosure control (SDC) aims to avoid that confidential information is derived from data released while, at the same time, maintaining as much as possible the data utility. NSAs work with two types of data: microdata and tabular data. Microdata files contain records of individuals or respondents (persons or enterprises) with attributes. For instance, a national census might collect attributes such as age, address, salary, etc. Tabular data contains aggregated information obtained by crossing one or more categorical variables from those microdata files. Several SDC methods are available to avoid that no confidential individual information can be obtained from the released microdata or tabular data. This thesis focus on tabular data protection, although the research carried out can be applied to other classes of problems. Controlled Tabular Adjustment(CTA) and Cell Suppression Problem (CSP) have concentrated most of the recent research in the tabular data protection field. Both methods formulate Mixed Integer Linear Programming problems (MILPs) which are challenging for tables of moderate size. Even finding a feasible initial solution may be a challenging task for large instances. Due to the fact that many end users give priority to fast executions and are thus satisfied, in practice, with suboptimal solutions, as a first result of this thesis we present an improvement of a known and successful heuristic for finding feasible solutions of MILPs, called feasibility pump. The new approach, based on the computation of analytic centers, is named the Analytic Center Feasbility Pump.The second contribution consists in the application of the fix-and-relax heuristic (FR) to the CTA method. FR (alone or in combination with other heuristics) is shown to be competitive compared to CPLEX branch-and-cut in terms of quickly finding either a feasible solution or a good upper bound. The last contribution of this thesis deals with general Benders decomposition, which is improved with the application of stabilization techniques. A stabilized Benders decomposition is presented,which focus on finding new solutions in the neighborhood of "good'' points. This approach is efficiently applied to the solution of realistic and real-world CSP instances, outperforming alternative approaches.The first two contributions are already published in indexed journals (Operations Research Letters and Computers and Operations Research). The third contribution is a working paper to be submitted soon.Un dels principals objectius dels Instituts Nacionals d'Estadística (INEs) és proporcionar, als ciutadans o als investigadors, una gran quantitat de dades estadístiques fiables i precises. Al mateix temps els INEs deuen garantir la confidencialitat estadística i que cap dada personal pot ser obtinguda gràcies a les dades estadístiques disseminades. La disciplina Control de revelació estadística (en anglès Statistical Disclosure Control, SDC) s'ocupa de garantir que cap dada individual pot derivar-se dels outputs de estadístics publicats però intentant al mateix temps mantenir el màxim possible de riquesa de les dades. Els INEs treballen amb dos tipus de dades: microdades i dades tabulars. Les microdades son arxius amb registres individuals de persones o empreses amb un conjunt d'atributs. Per exemple, el censos nacional recull atributs tals com l'edat, sexe, adreça o salari entre d'altres. Les dades tabulars són dades agregades obtingudes a partir del creuament d’un o més atributs o variables categòriques dels fitxers de microdades. Varis mètodes CRE són disponibles per evitar la revelació estadística en fitxers de microdades o dades tabulars. Aquesta tesi es centra en la protecció de dades tabulars tot i que la recerca duta a terme pot ser aplicada també a altres tipus de problemes. Els mètodes CTA (en anglès Controlled Tabular Adjustment) i CSP (en anglès Cell Suppression Problem) ha centrat la major part de la recerca feta en el camp de protecció de dades tabulars. Tots dos mètodes formulen problemes MILP (Mixed Integer Linear Programming problems) difícils de solucionar en taules de mida moderada. Fins i tot trobar solucions inicials factibles pot resultar molt difícil. Donat el fet que molts usuaris finals donen prioritat a tenir solucions ràpides i bones tot i que aquestes no siguin les òptimes, la primera contribució de la tesis presenta una millora en una coneguda i exitosa heurística per trobar solucions factibles de MILPs, anomenada feasibility pump. La nova aproximació, basada en el càlcul de centres analítics, s'anomena Analytic Center Feasibility Pump. La segona contribució consisteix en l'aplicació de la heurística fix-and-relax (FR) al mètode CTA. FR (sol o en combinació amb d'altres heurístiques) es mostra com a competitiu davant CPLEX branch-and-cut en termes de trobar ràpidament solucions factibles o bons upper bounds. La darrera contribució d’aquesta tesi tracta sobre el problema general de descomposició de Benders, aportant una millora amb l'aplicació de tècniques d’estabilització. Presentem un mètode anomenat stabilized Benders decomposition que es centra en trobar noves solucions properes a punts considerats prèviament com a bons. Aquesta aproximació ha estat eficientment aplicada al problema CSP, obtenint molt bons resultats en dades tabulars reals, millorant altres alternatives conegudes del mètode CSP. Les dues primeres contribucions ja han estat publicades en revistes indexades (Operations Research Letters and Computers and Operations Research). Actualment estem treballant en la publicació de la tercera contribució i serà en breu enviada a revisar.Postprint (published version

    Norby Collection Databases, Brookings Businesses Listed by Avenue Address

    Get PDF
    The Databases sub-group is composed of material was compiled by George Norby. This material covers Brookings (S.D.) related topics and includes businesses, historic homes, churches, city and county government, and South Dakota State University. As noted by George Norby within the collection, information compiled in these databases is as accurate as possible and was gathered from the following sources: Brookings County Press, Brookings Register, Brookings County Sentinel, Brookings telephone directories and business directories, Brookings City publications, Brookings County election returns, Brookings County Commission minutes, and records in the Brookings Count Register of Deeds office. While this material is quite extensive, it is recommended that researchers verify information from more than one source in order to conduct an accurate search
    corecore