Search CORE

45 research outputs found

Developing synthetic individual-level population datasets: The case of contextualizing maps of privacy-preserving census data

Author: Lin Yue
Xiao Ningchuan
Publication venue
Publication date: 01/04/2023
Field of study

The purpose of this paper is to describe the development of a synthetic population dataset that is open and realistic and can be used to facilitate understanding the cartographic process and contextualizing the cartographic artifacts. We first discuss an optimization model that is designed to construct the synthetic population by minimizing the difference between the summarized information of the synthetic populations and the statistics published in census data tables. We then illustrate how the synthetic population dataset can be used to contextualize maps made using privacy-preserving census data. Two counties in Ohio are used as case studies.Comment: AutoCarto 202

arXiv.org e-Print Archive

Investigating MAUP Effects on Census Data Using Approximately Equal-Population Aggregations (Short Paper)

Author: Lin Yue
Xiao Ningchuan
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 12th International Conference on Geographic Information Science (GIScience 2023)
Publication date: 01/01/2023
Field of study

The modifiable areal unit problem (MAUP) can significantly impact the use of census data as different choices in aggregating geographic zones can lead to varying outcomes. Previous research studied the effects using random aggregations, which, however, may lead to the use of impractical and unrealistic zones that deviate from recommended census geography criteria (e.g., equal population). To address this issue, this study proposes the use of approximately equal-population aggregations (AEPAs) for exploring MAUP effects on various statistical properties of census data, including Moran coefficients, correlation coefficients, and regression statistics. A multistart and recombination algorithm (MSRA) is used to generate multiple sets of high-quality AEPAs for testing MAUP effects. The results of our computational experiments highlight the need for more well-defined census geographies and realistic alternative zones to fully understand MAUP effects on census data

Dagstuhl Research Online Publication Server

Recommended from our members

Exploring the Tradeoff Between Privacy and Utility of Complete-count Census Data Using a Multiobjective Optimization Approach

Author: Lin Yue
Xiao Ningchuan
Publication venue
Publication date: 31/01/2024
Field of study

Privacy and utility are two important objectives to consider when releasing census data. However, these two objectives are often conflicting, as protecting privacy usually necessitates introducing noise into the data, which compromises data utility. Determining the appropriate level of privacy protection presents a significant challenge in the data release. Therefore, it is necessary to investigate the tradeoff between privacy and utility before making a final decision on the level of privacy protection. In this article, we propose a multiobjective optimization framework to generate multiple optimal solutions that satisfy the two objectives of privacy and utility, as well as to analyze the tradeoff between privacy and utility for decision-making. This framework relocates individuals susceptible to revealing their identities to protect their privacy. We maximize the number of individuals relocated while maximizing the utility of the data after relocations. The proposed framework is tested using synthetic population data in Franklin County, Ohio. Our experimental results show that the framework can efficiently generate a collection of optimal solutions and can be used to effectively balance privacy and utility

Knowledge UChicago

A Computational Framework for Preserving Privacy and Maintaining Utility of Geographically Aggregated Data: A Stochastic Spatial Optimization Approach

Author: Ningchuan Xiao (766596)
Yue Lin (115773)
Publication venue
Publication date: 20/03/2023
Field of study

Geographically aggregated data are often considered to be safe because information can be published by group as population counts rather than by individual. Identifiable information about individuals can still be disclosed when using such data, however. Conventional methods for protecting privacy, such as data swapping, often lack transparency because they do not quantify the reduction in disclosure risk. Recent methods, such as those based on differential privacy, could significantly compromise data utility by introducing excessive error. We develop a methodological framework to address the issues of privacy protection for geographically aggregated data while preserving data utility. In this framework, individuals at high risk of disclosure are moved to other locations to protect their privacy. Two spatial optimization models are developed to optimize these moves by maximizing privacy protection while maintaining data utility. The first model relocates all at-risk individuals while minimizing the error (hence maximizing the utility). The second model assumes a budget that specifies the maximum error to be introduced and maximizes the number of at-risk individuals being relocated within the error budget. Computational experiments performed on a synthetic population data set of two counties of Ohio indicate that the proposed models are effective and efficient in balancing data utility and privacy protection for real-world applications.</p

FigShare

Statistical Systems and Census Data in the Spatial Sciences

Author: Cockings Samantha
Ningchuan Xiao
Spielman Seth
TANTON Robert
Publication venue: 'Elsevier BV'
Publication date: 01/05/2017
Field of study

University of Canberra Research Repository