15 research outputs found
A Computational Framework for Preserving Privacy and Maintaining Utility of Geographically Aggregated Data: A Stochastic Spatial Optimization Approach
Geographically aggregated data are often considered to be safe because information can be published by group as population counts rather than by individual. Identifiable information about individuals can still be disclosed when using such data, however. Conventional methods for protecting privacy, such as data swapping, often lack transparency because they do not quantify the reduction in disclosure risk. Recent methods, such as those based on differential privacy, could significantly compromise data utility by introducing excessive error. We develop a methodological framework to address the issues of privacy protection for geographically aggregated data while preserving data utility. In this framework, individuals at high risk of disclosure are moved to other locations to protect their privacy. Two spatial optimization models are developed to optimize these moves by maximizing privacy protection while maintaining data utility. The first model relocates all at-risk individuals while minimizing the error (hence maximizing the utility). The second model assumes a budget that specifies the maximum error to be introduced and maximizes the number of at-risk individuals being relocated within the error budget. Computational experiments performed on a synthetic population data set of two counties of Ohio indicate that the proposed models are effective and efficient in balancing data utility and privacy protection for real-world applications.</p
The mean difference in the moving distance measure between the simulations and the data for the year.
<p>The numbers in bold indicate the best measure among the three models for each year.</p
Cumulative distributions of the days of stay.
<p>The dashed vertical line indicates the stay of 20 days.</p
Transhumance routes and seasonality.
<p>The map shows all the pastoralists in year 2007–2008. Circles represent campsite locations and consecutive sites are linked using straight lines. Large circles represent sojourn campsites (≥ 20 days). Here the seasons are the time when the pastoralists start the camp. The shaded area shaded is part of the Far North Region.</p
Transhumance modes for pastoralists in groups 1 and 2 (A) and group 3 (B).
<p>Transhumance modes for pastoralists in groups 1 and 2 (A) and group 3 (B).</p
Results from the MVN model.
<p>Each contour encloses an area where <i>G</i>(<i>x</i><sub><i>t</i></sub> = <i>x</i>) ≥ 95%. Each map shows the result after overlaying the contours of every 5 days throughout the year. Groups 1, 2, and 3 are represented in the figures from left to the right. The part of contours outside the study area is not shown in the maps.</p
The kernel density of the camps during the hot dry season.
<p>A kernel of 5 km is used. The dots are the locations over the four years of the transhumance data.</p
The mean difference in overlapped convex hull ratio between the simulations and the data for the year.
<p>The numbers in bold indicate the best measure among the three models for each year. Italic numbers refer to the highest ratios in all models except KRN1. The rows marked as DATA show the overlapped convex hull ratio between the data of the year and that of year 2007–2008 for each group.</p
Results from the KRN model.
<p>Each contour encloses an area where <i>F</i>(<i>x</i><sub><i>t</i></sub> = <i>x</i>) ≥ 95%. Each map shows the result after overlaying the contours of every 5 days throughout the year. Groups 1, 2, and 3 are represented in the figures from left to the right. Each map shows the results from three bandwidths: 1 km (light grey solid lines), 5 km (colored solid lines), and 20 km (dashed light grey lines).</p
Daily mean closeness between pastoralists obtained from the 2007–2008 data and 100 simulations using STM, KRN, and MVN.
<p>Daily mean closeness between pastoralists obtained from the 2007–2008 data and 100 simulations using STM, KRN, and MVN.</p