652 research outputs found
Securing Social Media User Data - An Adversarial Approach
Social media users generate tremendous amounts of data. To better serve
users, it is required to share the user-related data among researchers,
advertisers and application developers. Publishing such data would raise more
concerns on user privacy. To encourage data sharing and mitigate user privacy
concerns, a number of anonymization and de-anonymization algorithms have been
developed to help protect privacy of social media users. In this work, we
propose a new adversarial attack specialized for social media data. We further
provide a principled way to assess effectiveness of anonymizing different
aspects of social media data. Our work sheds light on new privacy risks in
social media data due to innate heterogeneity of user-generated data which
require striking balance between sharing user data and protecting user privacy.Comment: Published in the 29th ACM Conference on Hypertext and Social Media,
Baltimore, MD, USA (HT-18
Privacy in Social Media: Identification, Mitigation and Applications
The increasing popularity of social media has attracted a huge number of
people to participate in numerous activities on a daily basis. This results in
tremendous amounts of rich user-generated data. This data provides
opportunities for researchers and service providers to study and better
understand users' behaviors and further improve the quality of the personalized
services. Publishing user-generated data risks exposing individuals' privacy.
Users privacy in social media is an emerging task and has attracted increasing
attention in recent years. These works study privacy issues in social media
from the two different points of views: identification of vulnerabilities, and
mitigation of privacy risks. Recent research has shown the vulnerability of
user-generated data against the two general types of attacks, identity
disclosure and attribute disclosure. These privacy issues mandate social media
data publishers to protect users' privacy by sanitizing user-generated data
before publishing it. Consequently, various protection techniques have been
proposed to anonymize user-generated social media data. There is a vast
literature on privacy of users in social media from many perspectives. In this
survey, we review the key achievements of user privacy in social media. In
particular, we review and compare the state-of-the-art algorithms in terms of
the privacy leakage attacks and anonymization algorithms. We overview the
privacy risks from different aspects of social media and categorize the
relevant works into five groups 1) graph data anonymization and
de-anonymization, 2) author identification, 3) profile attribute disclosure, 4)
user location and privacy, and 5) recommender systems and privacy issues. We
also discuss open problems and future research directions for user privacy
issues in social media.Comment: This survey is currently under revie
Protecting User Privacy: An Approach for Untraceable Web Browsing History and Unambiguous User Profiles
The overturning of the Internet Privacy Rules by the Federal Communications
Commissions (FCC) in late March 2017 allows Internet Service Providers (ISPs)
to collect, share and sell their customers' Web browsing data without their
consent. With third-party trackers embedded on Web pages, this new rule has put
user privacy under more risk. The need arises for users on their own to protect
their Web browsing history from any potential adversaries. Although some
available solutions such as Tor, VPN, and HTTPS can help users conceal their
online activities, their use can also significantly hamper personalized online
services, i.e., degraded utility. In this paper, we design an effective Web
browsing history anonymization scheme, PBooster, aiming to protect users'
privacy while retaining the utility of their Web browsing history. The proposed
model pollutes users' Web browsing history by automatically inferring how many
and what links should be added to the history while addressing the
utility-privacy trade-off challenge. We conduct experiments to validate the
quality of the manipulated Web browsing history and examine the robustness of
the proposed approach for user privacy protection.Comment: This paper is accepted in the 12th ACM International Conference on
Web Search and Data Mining (WSDM-2019
De-anonymizing Social Networks
Operators of online social networks are increasingly sharing potentially
sensitive information about users and their relationships with advertisers,
application developers, and data-mining researchers. Privacy is typically
protected by anonymization, i.e., removing names, addresses, etc.
We present a framework for analyzing privacy and anonymity in social networks
and develop a new re-identification algorithm targeting anonymized
social-network graphs. To demonstrate its effectiveness on real-world networks,
we show that a third of the users who can be verified to have accounts on both
Twitter, a popular microblogging service, and Flickr, an online photo-sharing
site, can be re-identified in the anonymous Twitter graph with only a 12% error
rate.
Our de-anonymization algorithm is based purely on the network topology, does
not require creation of a large number of dummy "sybil" nodes, is robust to
noise and all existing defenses, and works even when the overlap between the
target network and the adversary's auxiliary information is small.Comment: Published in the 30th IEEE Symposium on Security and Privacy, 2009.
The definitive version is available at:
http://www.cs.utexas.edu/~shmat/shmat_oak09.pdf Frequently Asked Questions
are answered at: http://www.cs.utexas.edu/~shmat/socialnetworks-faq.htm
An efficient reconciliation algorithm for social networks
People today typically use multiple online social networks (Facebook,
Twitter, Google+, LinkedIn, etc.). Each online network represents a subset of
their "real" ego-networks. An interesting and challenging problem is to
reconcile these online networks, that is, to identify all the accounts
belonging to the same individual. Besides providing a richer understanding of
social dynamics, the problem has a number of practical applications. At first
sight, this problem appears algorithmically challenging. Fortunately, a small
fraction of individuals explicitly link their accounts across multiple
networks; our work leverages these connections to identify a very large
fraction of the network.
Our main contributions are to mathematically formalize the problem for the
first time, and to design a simple, local, and efficient parallel algorithm to
solve it. We are able to prove strong theoretical guarantees on the algorithm's
performance on well-established network models (Random Graphs, Preferential
Attachment). We also experimentally confirm the effectiveness of the algorithm
on synthetic and real social network data sets.Comment: 23 pages, 4 figures. To appear in VLDB 201
Matching Entities Across Online Social Networks
Online Social Networks (OSNs), such as Facebook and Twitter, have become an
integral part of our daily lives. There are hundreds of OSNs, each with its own
focus in that each offers particular services and functionalities. Recent
studies show that many OSN users create several accounts on multiple OSNs using
the same or different personal information. Collecting all the available data
of an individual from several OSNs and fusing it into a single profile can be
useful for many purposes. In this paper, we introduce novel machine learning
based methods for solving Entity Resolution (ER), a problem for matching user
profiles across multiple OSNs. The presented methods are able to match between
two user profiles from two different OSNs based on supervised learning
techniques, which use features extracted from each one of the user profiles. By
using the extracted features and supervised learning techniques, we developed
classifiers which can perform entity matching between two profiles for the
following scenarios: (a) matching entities across two OSNs; (b) searching for a
user by similar name; and (c) de-anonymizing a user's identity.
The constructed classifiers were tested by using data collected from two
popular OSNs, Facebook and Xing. We then evaluated the classifiers'
performances using various evaluation measures, such as true and false positive
rates, accuracy, and the Area Under the receiver operator Curve (AUC). The
constructed classifiers were evaluated and their classification performance
measured by AUC was quite remarkable, with an AUC of up to 0.982 and an
accuracy of up to 95.9% in identifying user profiles across two OSNs
A Record Linkage Model Incorporating Relational Data
In this paper we introduce a novel Bayesian approach for linking multiple
social networks in order to discover the same real world person having
different accounts across networks. In particular, we develop a latent model
that allow us to jointly characterize the network and linkage structures
relying in both relational and profile data. In contrast to other existing
approaches in the machine learning literature, our Bayesian implementation
naturally provides uncertainty quantification via posterior probabilities for
the linkage structure itself or any function of it. Our findings clearly
suggest that our methodology can produce accurate point estimates of the
linkage structure even in the absence of profile information, and also, in an
identity resolution setting, our results confirm that including relational data
into the matching process improves the linkage accuracy. We illustrate our
methodology using real data from popular social networks such as Twitter,
Facebook, and YouTube
A Survey on Privacy and Security in Online Social Networks
Online Social Networks (OSN) are a permanent presence in today's personal and
professional lives of a huge segment of the population, with direct
consequences to offline activities. Built on a foundation of trust-users
connect to other users with common interests or overlapping personal
trajectories-online social networks and the associated applications extract an
unprecedented volume of personal information. Unsurprisingly, serious privacy
and security risks emerged, positioning themselves along two main types of
attacks: attacks that exploit the implicit trust embedded in declared social
relationships; and attacks that harvest user's personal information for
ill-intended use. This article provides an overview of the privacy and security
issues that emerged so far in OSNs. We introduce a taxonomy of privacy and
security attacks in OSNs, we overview existing solutions to mitigate those
attacks, and outline challenges still to overcome
A Survey on the Security of Pervasive Online Social Networks (POSNs)
Pervasive Online Social Networks (POSNs) are the extensions of Online Social
Networks (OSNs) which facilitate connectivity irrespective of the domain and
properties of users. POSNs have been accumulated with the convergence of a
plethora of social networking platforms with a motivation of bridging their
gap. Over the last decade, OSNs have visually perceived an altogether
tremendous amount of advancement in terms of the number of users as well as
technology enablers. A single OSN is the property of an organization, which
ascertains smooth functioning of its accommodations for providing a quality
experience to their users. However, with POSNs, multiple OSNs have coalesced
through communities, circles, or only properties, which make
service-provisioning tedious and arduous to sustain. Especially, challenges
become rigorous when the focus is on the security perspective of cross-platform
OSNs, which are an integral part of POSNs. Thus, it is of utmost paramountcy to
highlight such a requirement and understand the current situation while
discussing the available state-of-the-art. With the modernization of OSNs and
convergence towards POSNs, it is compulsory to understand the impact and reach
of current solutions for enhancing the security of users as well as associated
services. This survey understands this requisite and fixates on different sets
of studies presented over the last few years and surveys them for their
applicability to POSNs...Comment: 39 Pages, 10 Figure
De-Health: All Your Online Health Information Are Belong to Us
In this paper, we study the privacy of online health data. We present a novel
online health data De-Anonymization (DA) framework, named De-Health. De-Health
consists of two phases: Top-K DA, which identifies a candidate set for each
anonymized user, and refined DA, which de-anonymizes an anonymized user to a
user in its candidate set. By employing both candidate selection and DA
verification schemes, De-Health significantly reduces the DA space by several
orders of magnitude while achieving promising DA accuracy. Leveraging two real
world online health datasets WebMD (89,393 users, 506K posts) and HealthBoards
(388,398 users, 4.7M posts), we validate the efficacy of De-Health. Further,
when the training data are insufficient, De-Health can still successfully
de-anonymize a large portion of anonymized users.
We develop the first analytical framework on the soundness and effectiveness
of online health data DA. By analyzing the impact of various data features on
the anonymity, we derive the conditions and probabilities for successfully
de-anonymizing one user or a group of users in exact DA and Top-K DA. Our
analysis is meaningful to both researchers and policy makers in facilitating
the development of more effective anonymization techniques and proper privacy
polices.
We present a linkage attack framework which can link online health/medical
information to real world people. Through a proof-of-concept attack, we link
347 out of 2805 WebMD users to real world people, and find the full names,
medical/health information, birthdates, phone numbers, and other sensitive
information for most of the re-identified users. This clearly illustrates the
fragility of the notion of privacy of those who use online health forums
- …