11 research outputs found
You are your Metadata: Identification and Obfuscation of Social Media Users using Metadata Information
Metadata are associated to most of the information we produce in our daily
interactions and communication in the digital world. Yet, surprisingly,
metadata are often still catergorized as non-sensitive. Indeed, in the past,
researchers and practitioners have mainly focused on the problem of the
identification of a user from the content of a message.
In this paper, we use Twitter as a case study to quantify the uniqueness of
the association between metadata and user identity and to understand the
effectiveness of potential obfuscation strategies. More specifically, we
analyze atomic fields in the metadata and systematically combine them in an
effort to classify new tweets as belonging to an account using different
machine learning algorithms of increasing complexity. We demonstrate that
through the application of a supervised learning algorithm, we are able to
identify any user in a group of 10,000 with approximately 96.7% accuracy.
Moreover, if we broaden the scope of our search and consider the 10 most likely
candidates we increase the accuracy of the model to 99.22%. We also found that
data obfuscation is hard and ineffective for this type of data: even after
perturbing 60% of the training data, it is still possible to classify users
with an accuracy higher than 95%. These results have strong implications in
terms of the design of metadata obfuscation strategies, for example for data
set release, not only for Twitter, but, more generally, for most social media
platforms.Comment: 11 pages, 13 figures. Published in the Proceedings of the 12th
International AAAI Conference on Web and Social Media (ICWSM 2018). June
2018. Stanford, CA, US
DiffProtect: Generate Adversarial Examples with Diffusion Models for Facial Privacy Protection
The increasingly pervasive facial recognition (FR) systems raise serious
concerns about personal privacy, especially for billions of users who have
publicly shared their photos on social media. Several attempts have been made
to protect individuals from being identified by unauthorized FR systems
utilizing adversarial attacks to generate encrypted face images. However,
existing methods suffer from poor visual quality or low attack success rates,
which limit their utility. Recently, diffusion models have achieved tremendous
success in image generation. In this work, we ask: can diffusion models be used
to generate adversarial examples to improve both visual quality and attack
performance? We propose DiffProtect, which utilizes a diffusion autoencoder to
generate semantically meaningful perturbations on FR systems. Extensive
experiments demonstrate that DiffProtect produces more natural-looking
encrypted images than state-of-the-art methods while achieving significantly
higher attack success rates, e.g., 24.5% and 25.1% absolute improvements on the
CelebA-HQ and FFHQ datasets.Comment: Code will be available at https://github.com/joellliu/DiffProtect
Privacy Intelligence: A Survey on Image Sharing on Online Social Networks
Image sharing on online social networks (OSNs) has become an indispensable
part of daily social activities, but it has also led to an increased risk of
privacy invasion. The recent image leaks from popular OSN services and the
abuse of personal photos using advanced algorithms (e.g. DeepFake) have
prompted the public to rethink individual privacy needs when sharing images on
OSNs. However, OSN image sharing itself is relatively complicated, and systems
currently in place to manage privacy in practice are labor-intensive yet fail
to provide personalized, accurate and flexible privacy protection. As a result,
an more intelligent environment for privacy-friendly OSN image sharing is in
demand. To fill the gap, we contribute a systematic survey of 'privacy
intelligence' solutions that target modern privacy issues related to OSN image
sharing. Specifically, we present a high-level analysis framework based on the
entire lifecycle of OSN image sharing to address the various privacy issues and
solutions facing this interdisciplinary field. The framework is divided into
three main stages: local management, online management and social experience.
At each stage, we identify typical sharing-related user behaviors, the privacy
issues generated by those behaviors, and review representative intelligent
solutions. The resulting analysis describes an intelligent privacy-enhancing
chain for closed-loop privacy management. We also discuss the challenges and
future directions existing at each stage, as well as in publicly available
datasets.Comment: 32 pages, 9 figures. Under revie
The (Co)-Location Sharing Game
Most popular location-based social networks, such as Facebook and Foursquare, let their (mobile) users post location and co-location (involving other users) information. Such posts bring social benefits to the users who post them but also to their friends who view them. Yet, they also represent a severe threat to the users’ privacy, as co-location information introduces interdependences between users. We propose the first game-theoretic framework for analyzing the strategic behaviors, in terms of information sharing, of users of OSNs. To design parametric utility functions that are representative of the users’ actual preferences, we also conduct a survey of 250 Facebook users and use conjoint analysis to quantify the users’ benefits of sharing vs. viewing (co)-location information and their preference for privacy vs. benefits. Our survey findings expose the fact that, among the users, there is a large variation, in terms of these preferences. We extensively evaluate our framework through data-driven numerical simulations. We study how users’ individual preferences influence each other’s decisions, we identify several factors that significantly affect these decisions (among which, the mobility data of the users), and we determine situations where dangerous patterns can emerge (e.g., a vicious circle of sharing, or an incentive to over-share) – even when the users share similar preferences
Interdependent and Multi-Subject Privacy: Threats, Analysis and Protection
In Alan Westin's generally accepted definition of privacy, he describes it as an individual's right 'to control, edit, manage, and delete information about them[selves] and decide when, how, and to what extent information is communicated to others.' Therefore, privacy is an individual and independent human right. The great Mahatma Gandhi once said that 'interdependence is and ought to be as much the ideal of man as selfsufficiency. Man is a social being.' To ensure this independent right to inherently social beings, it will be difficult, if not impossible. This is especially true as today's world is highly interconnected, technology evolves rapidly, data sharing is increasingly abundant, and regulations do not provide sufficient guidance in the realm of interdependency.
In this thesis, we explore the topic of interdependent privacy from an adversarial point of view by exposing threats, as well as from an end-user point of view, by exploring awareness, preferences and privacy protection needs.
First, we quantify the effect of co-locations on location privacy, considering an adversary such as a social-network operator that has access to this information: Not only can a user be localized due to her reported locations and mobility patterns, but also due to those of her friends (and the friends of her friends and so on). We formalize this problem and propose effective inference algorithms that substantially reduce the complexity of localization attacks that make use of co-locations. Our results show that an adversary can effectively incorporate co-locations in attacks to substantially reduce users' location privacy; this exposes a real and severe threat.
Second, we investigate the interplay between the privacy risks and the social benefits of users when sharing (co-)locations on OSNs. We propose a game-theoretic framework for analyzing users' strategic behaviors. We conduct a survey of Facebook users and quantify their benefits of sharing vs. viewing information and their preference for privacy vs. benefits. Our survey exposes deficits in users' awareness of privacy risks in OSNs. Our results further show how users' individual preferences influence, sometimes in a negative way, each other's decisions.
Third, we consider various types of interdependent and multi-subject data (photo, colocation, genome, etc.) that often have privacy implications for data subjects other than the uploader, yet can be shared without their consent or awareness. We propose a system for sharing such data in a consensual and privacy-preserving manner. We implement it in the case of photos, by relying on image-processing and cryptographic techniques, as well as on a two-tier architecture. We conduct a survey of Facebook users; it indicates that there is interest in such a system, and that users have increasing privacy concerns due to prejudice or discrimination that they have been or could still easily be exposed to.
In conclusion, this thesis provides new insights on users' privacy in the context of interdependence and constitutes a step towards the design of novel privacy-protection mechanisms. It should be seen as a warning message for service providers and regulatory institutions: Unless the interdependent aspects of privacy are considered, this fundamental human right can never be guaranteed
Data-Driven, Personalized Usable Privacy
We live in the "inverse-privacy" world, where service providers derive insights from users' data that the users do not even know about. This has been fueled by the advancements in machine learning technologies, which allowed providers to go beyond the superficial analysis of users' transactions to the deep inspection of users' content. Users themselves have been facing several problems in coping with this widening information discrepancy. Although the interfaces of apps and websites are generally equipped with privacy indicators (e.g., permissions, policies, ...), this has not been enough to create the counter-effect. We particularly identify three of the gaps that hindered the effectiveness and usability of privacy indicators: - Scale Adaptation: The scale at which service providers are collecting data has been growing on multiple fronts. Users, on the other hand, have limited time, effort, and technological resources to cope with this scale. - Risk Communication: Although providers utilize privacy indicators to announce what and (less often) why they need particular pieces of information, they rarely relay what can be potentially inferred from this data. Without this knowledge, users are less equipped to make informed decisions when they sign in to a site or install an application. - Language Complexity: The information practices of service providers are buried in complex, long privacy policies. Generally, users do not have the time and sometimes the skills to decipher such policies, even when they are interested in knowing particular pieces of it. In this thesis, we approach usable privacy from a data perspective. Instead of static privacy interfaces that are obscure, recurring, or unreadable, we develop techniques that bridge the understanding gap between users and service providers. Towards that, we make the following contributions: - Crowdsourced, data-driven privacy decision-making: In an effort to combat the growing scale of data exposure, we consider the context of files uploaded to cloud services. We propose C3P, a framework for automatically assessing the sensitivity of files, thus enabling realtime, fine-grained policy enforcement on top of unstructured data. - Data-driven app privacy indicators: We introduce PrivySeal, which involves a new paradigm of dynamic, personalized app privacy indicators that bridge the risk under- standing gap between users and providers. Through PrivySeal's online platform, we also study the emerging problem of interdependent privacy in the context of cloud apps and provide a usable privacy indicator to mitigate it. - Automated question answering about privacy practices: We introduce PriBot, the first automated question-answering system for privacy policies, which allows users to pose their questions about the privacy practices of any company with their own language. Through a user study, we show its effectiveness at achieving high accuracy and relevance for users, thus narrowing the complexity gap in navigating privacy policies. A core aim of this thesis is paving the road for a future where privacy indicators are not bound by a specific medium or pre-scripted wording. We design and develop techniques that enable privacy to be communicated effectively in an interface that is approachable to the user. For that, we go beyond textual interfaces to enable dynamic, visual, and hands-free privacy interfaces that are fit for the variety of emerging technologies