34 research outputs found
Where you go is who you are -- A study on machine learning based semantic privacy attacks
Concerns about data privacy are omnipresent, given the increasing usage of
digital applications and their underlying business model that includes selling
user data. Location data is particularly sensitive since they allow us to infer
activity patterns and interests of users, e.g., by categorizing visited
locations based on nearby points of interest (POI). On top of that, machine
learning methods provide new powerful tools to interpret big data. In light of
these considerations, we raise the following question: What is the actual risk
that realistic, machine learning based privacy attacks can obtain meaningful
semantic information from raw location data, subject to inaccuracies in the
data? In response, we present a systematic analysis of two attack scenarios,
namely location categorization and user profiling. Experiments on the
Foursquare dataset and tracking data demonstrate the potential for abuse of
high-quality spatial information, leading to a significant privacy loss even
with location inaccuracy of up to 200m. With location obfuscation of more than
1 km, spatial information hardly adds any value, but a high privacy risk solely
from temporal information remains. The availability of public context data such
as POIs plays a key role in inference based on spatial information. Our
findings point out the risks of ever-growing databases of tracking data and
spatial context data, which policymakers should consider for privacy
regulations, and which could guide individuals in their personal location
protection measures
Computers, Environment and Urban Systems / Adaptive areal elimination (AAE) : a transparent way of disclosing protected spatial datasets
Geographical masking is the conventional solution to protect the privacy of individuals involved in confidential spatial point datasets. The masking process displaces confidential locations to protect individual privacy while maintaining a fine level of spatial resolution. The adaptive form of this process aims to further minimize the displacement error by taking into account the underlying population density. We describe an alternative adaptive geomasking method, referred to as Adaptive Areal Elimination (AAE). AAE creates areas of a minimum K-anonymity and then original points are either randomly perturbed within the areas or aggregated to the median centers of the areas. In addition to the masked points, K-anonymized areas can be safely disclosed as well without increasing the risk of re-identification. Using a burglary dataset from Vienna, AAE is compared with an existing adaptive geographical mask, the donut mask. The masking methods are evaluated for preserving a predefined K-anonymity and the spatial characteristics of the original points. The spatial characteristics are assessed with four measures of spatial error: displaced distance, correlation coefficient of density surfaces, hotspots' divergence, and clusters' specificity. Masked points from point aggregation of AAE have the highest spatial error in all the measures but the displaced distance. In contrast, masked points from the donut mask are displaced the least, preserve the original spatial clusters better, have the highest clusters' specificity and correlation coefficient of density surfaces. However, when the donut mask is adapted to achieve an actual K-anonymity, the random perturbation of AAE introduces less spatial error than the donut mask for all the measures of spatial error.(VLID)231721
Privacy Threats and Protection Recommendations for the Use of Geosocial Network Data in Research
Inference attacks and protection measures are two sides of the same coin. Although the former aims to reveal information while the latter aims to hide it, they both increase awareness regarding the risks and threats from social media apps. On the one hand, inference attack studies explore the types of personal information that can be revealed and the methods used to extract it. An additional risk is that geosocial media data are collected massively for research purposes, and the processing or publication of these data may further compromise individual privacy. On the other hand, consistent and increasing research on location protection measures promises solutions that mitigate disclosure risks. In this paper, we examine recent research efforts on the spectrum of privacy issues related to geosocial network data and identify the contributions and limitations of these research efforts. Furthermore, we provide protection recommendations to researchers that share, anonymise, and store social media data or publish scientific results
Defining a Threshold Value for Maximum Spatial Information Loss of Masked Geo-Data
Geographical masks are a group of location protection methods for the dissemination and publication of confidential and sensitive information, such as health- and crime-related geo-referenced data. The use of such masks ensures that privacy is protected for the individuals involved in the datasets. Nevertheless, the protection process introduces spatial error to the masked dataset. This study quantifies the spatial error of masked datasets using two approaches. First, a perceptual survey was employed where participants ranked the similarity of a diverse sample of masked and original maps. Second, a spatial statistical analysis was performed that provided quantitative results for the same pairs of maps. Spatial statistical similarity is calculated with three divergence indices that employ different spatial clustering methods. All indices are significantly correlated with the perceptual similarity. Finally, the results of the spatial analysis are used as the explanatory variable to estimate the perceptual similarity. Three prediction models are created that indicate upper boundaries for the spatial statistical results upon which the masked data are perceived differently from the original data. The results of the study aim to help potential “maskers” to quantify and evaluate the error of confidential masked visualizations
Towards geoprivacy guidelines for spatial data
This paper proposes an approach towards practical privacy guidelines for the different stages of a research effort that collects and/or uses “sensitive” spatial data. Specifically, we focus on: a) initial tasks as prior to starting a survey, b) storing, anonymization, and assessment of datasets, and c) actions to eliminate disclosure from published data and deliverables or when datasets are shared with third parties
ISPRS International Journal of Geo-Information / Defining a threshold value for maximum spatial information loss of masked geo-data
Geographical masks are a group of location protection methods for the dissemination and publication of confidential and sensitive information, such as health- and crime-related geo-referenced data. The use of such masks ensures that privacy is protected for the individuals involved in the datasets. Nevertheless, the protection process introduces spatial error to the masked dataset. This study quantifies the spatial error of masked datasets using two approaches. First, a perceptual survey was employed where participants ranked the similarity of a diverse sample of masked and original maps. Second, a spatial statistical analysis was performed that provided quantitative results for the same pairs of maps. Spatial statistical similarity is calculated with three divergence indices that employ different spatial clustering methods. All indices are significantly correlated with the perceptual similarity. Finally, the results of the spatial analysis are used as the explanatory variable to estimate the perceptual similarity. Three prediction models are created that indicate upper boundaries for the spatial statistical results upon which the masked data are perceived differently from the original data. The results of the study aim to help potential “maskers” to quantify and evaluate the error of confidential masked visualizations.(VLID)219022
Transactions in GIS / Spatial information divergence : Using Global and Local Indices to compare geographical masks applied to crime data
Advances in Geographic Information Science (GISc) and the increasing availability of location data have facilitated the dissemination of crime data and the abundance of crime mapping websites. However, data holders acknowledge that when releasing sensitive crime data there is a risk of compromising the victims' privacy. Hence, protection methodologies are primarily applied to the data to ensure that individual privacy is not violated. This article addresses one group of location protection methodologies, namely geographical masks that are applicable for crime data representations. The purpose is to identify which mask is the most appropriate for crime incident visualizations. A global divergence index (GDi) and a local divergence index (LDi) are developed to compare the effects that these masks have on the original crime point pattern. The indices calculate how dissimilar the spatial information of the masked data is from the spatial information of the original data in regards to the information obtained via spatial crime analysis. The results of the analysis show that the variable radius mask and the donut geomask should be primarily used for crime representations as they produce less spatial information divergence of the original crime point pattern than the alternative local random rotation mask and circular mask.(VLID)221518
Privacy Threats and Protection Recommendations for the Use of Geosocial Network Data in Research
Inference attacks and protection measures are two sides of the same coin. Although the former aims to reveal information while the latter aims to hide it, they both increase awareness regarding the risks and threats from social media apps. On the one hand, inference attack studies explore the types of personal information that can be revealed and the methods used to extract it. An additional risk is that geosocial media data are collected massively for research purposes, and the processing or publication of these data may further compromise individual privacy. On the other hand, consistent and increasing research on location protection measures promises solutions that mitigate disclosure risks. In this paper, we examine recent research efforts on the spectrum of privacy issues related to geosocial network data and identify the contributions and limitations of these research efforts. Furthermore, we provide protection recommendations to researchers that share, anonymise, and store social media data or publish scientific results
Geosocial Media Data as Predictors in a GWR Application to Forecast Crime Hotspots (Short Paper)
In this paper we forecast hotspots of street crime in Portland, Oregon. Our approach uses geosocial media posts, which define the predictors in geographically weighted regression (GWR) models. We use two predictors that are both derived from Twitter data. The first one is the population at risk of being victim of street crime. The second one is the crime related tweets. These two predictors were used in GWR to create models that depict future street crime hotspots. The predicted hotspots enclosed more than 23% of the future street crimes in 1% of the study area and also outperformed the prediction efficiency of a baseline approach. Future work will focus on optimizing the prediction parameters and testing the applicability of this approach to other mobile crime types