Search CORE

652 research outputs found

Securing Social Media User Data - An Adversarial Approach

Author: Beigi Ghazaleh
Liu Huan
Shu Kai
Zhang Yanchao
Publication venue
Publication date: 01/05/2018
Field of study

Social media users generate tremendous amounts of data. To better serve users, it is required to share the user-related data among researchers, advertisers and application developers. Publishing such data would raise more concerns on user privacy. To encourage data sharing and mitigate user privacy concerns, a number of anonymization and de-anonymization algorithms have been developed to help protect privacy of social media users. In this work, we propose a new adversarial attack specialized for social media data. We further provide a principled way to assess effectiveness of anonymizing different aspects of social media data. Our work sheds light on new privacy risks in social media data due to innate heterogeneity of user-generated data which require striking balance between sharing user data and protecting user privacy.Comment: Published in the 29th ACM Conference on Hypertext and Social Media, Baltimore, MD, USA (HT-18

arXiv.org e-Print Archive

Privacy in Social Media: Identification, Mitigation and Applications

Author: Beigi Ghazaleh
Liu Huan
Publication venue
Publication date: 06/08/2018
Field of study

The increasing popularity of social media has attracted a huge number of people to participate in numerous activities on a daily basis. This results in tremendous amounts of rich user-generated data. This data provides opportunities for researchers and service providers to study and better understand users' behaviors and further improve the quality of the personalized services. Publishing user-generated data risks exposing individuals' privacy. Users privacy in social media is an emerging task and has attracted increasing attention in recent years. These works study privacy issues in social media from the two different points of views: identification of vulnerabilities, and mitigation of privacy risks. Recent research has shown the vulnerability of user-generated data against the two general types of attacks, identity disclosure and attribute disclosure. These privacy issues mandate social media data publishers to protect users' privacy by sanitizing user-generated data before publishing it. Consequently, various protection techniques have been proposed to anonymize user-generated social media data. There is a vast literature on privacy of users in social media from many perspectives. In this survey, we review the key achievements of user privacy in social media. In particular, we review and compare the state-of-the-art algorithms in terms of the privacy leakage attacks and anonymization algorithms. We overview the privacy risks from different aspects of social media and categorize the relevant works into five groups 1) graph data anonymization and de-anonymization, 2) author identification, 3) profile attribute disclosure, 4) user location and privacy, and 5) recommender systems and privacy issues. We also discuss open problems and future research directions for user privacy issues in social media.Comment: This survey is currently under revie

arXiv.org e-Print Archive

Protecting User Privacy: An Approach for Untraceable Web Browsing History and Unambiguous User Profiles

Author: Beigi Ghazaleh
Guo Ruocheng
Liu Huan
Nou Alexander
Zhang Yanchao
Publication venue
Publication date: 22/11/2018
Field of study

The overturning of the Internet Privacy Rules by the Federal Communications Commissions (FCC) in late March 2017 allows Internet Service Providers (ISPs) to collect, share and sell their customers' Web browsing data without their consent. With third-party trackers embedded on Web pages, this new rule has put user privacy under more risk. The need arises for users on their own to protect their Web browsing history from any potential adversaries. Although some available solutions such as Tor, VPN, and HTTPS can help users conceal their online activities, their use can also significantly hamper personalized online services, i.e., degraded utility. In this paper, we design an effective Web browsing history anonymization scheme, PBooster, aiming to protect users' privacy while retaining the utility of their Web browsing history. The proposed model pollutes users' Web browsing history by automatically inferring how many and what links should be added to the history while addressing the utility-privacy trade-off challenge. We conduct experiments to validate the quality of the manipulated Web browsing history and examine the robustness of the proposed approach for user privacy protection.Comment: This paper is accepted in the 12th ACM International Conference on Web Search and Data Mining (WSDM-2019

arXiv.org e-Print Archive

De-anonymizing Social Networks

Author: Narayanan Arvind
Shmatikov Vitaly
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc. We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate. Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy "sybil" nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary's auxiliary information is small.Comment: Published in the 30th IEEE Symposium on Security and Privacy, 2009. The definitive version is available at: http://www.cs.utexas.edu/~shmat/shmat_oak09.pdf Frequently Asked Questions are answered at: http://www.cs.utexas.edu/~shmat/socialnetworks-faq.htm

arXiv.org e-Print Archive

CiteSeerX

An efficient reconciliation algorithm for social networks

Author: Korula Nitish
Lattanzi Silvio
Publication venue
Publication date: 19/11/2013
Field of study

People today typically use multiple online social networks (Facebook, Twitter, Google+, LinkedIn, etc.). Each online network represents a subset of their "real" ego-networks. An interesting and challenging problem is to reconcile these online networks, that is, to identify all the accounts belonging to the same individual. Besides providing a richer understanding of social dynamics, the problem has a number of practical applications. At first sight, this problem appears algorithmically challenging. Fortunately, a small fraction of individuals explicitly link their accounts across multiple networks; our work leverages these connections to identify a very large fraction of the network. Our main contributions are to mathematically formalize the problem for the first time, and to design a simple, local, and efficient parallel algorithm to solve it. We are able to prove strong theoretical guarantees on the algorithm's performance on well-established network models (Random Graphs, Preferential Attachment). We also experimentally confirm the effectiveness of the algorithm on synthetic and real social network data sets.Comment: 23 pages, 4 figures. To appear in VLDB 201

arXiv.org e-Print Archive

Matching Entities Across Online Social Networks

Author: Elovici Yuval
Fire Michael
Peled Olga
Rokach Lior
Publication venue
Publication date: 04/11/2014
Field of study

Online Social Networks (OSNs), such as Facebook and Twitter, have become an integral part of our daily lives. There are hundreds of OSNs, each with its own focus in that each offers particular services and functionalities. Recent studies show that many OSN users create several accounts on multiple OSNs using the same or different personal information. Collecting all the available data of an individual from several OSNs and fusing it into a single profile can be useful for many purposes. In this paper, we introduce novel machine learning based methods for solving Entity Resolution (ER), a problem for matching user profiles across multiple OSNs. The presented methods are able to match between two user profiles from two different OSNs based on supervised learning techniques, which use features extracted from each one of the user profiles. By using the extracted features and supervised learning techniques, we developed classifiers which can perform entity matching between two profiles for the following scenarios: (a) matching entities across two OSNs; (b) searching for a user by similar name; and (c) de-anonymizing a user's identity. The constructed classifiers were tested by using data collected from two popular OSNs, Facebook and Xing. We then evaluated the classifiers' performances using various evaluation measures, such as true and false positive rates, accuracy, and the Area Under the receiver operator Curve (AUC). The constructed classifiers were evaluated and their classification performance measured by AUC was quite remarkable, with an AUC of up to 0.982 and an accuracy of up to 95.9% in identifying user profiles across two OSNs

arXiv.org e-Print Archive

A Record Linkage Model Incorporating Relational Data

Author: Rodriguez Abel
Sosa Juan
Publication venue
Publication date: 13/08/2018
Field of study

In this paper we introduce a novel Bayesian approach for linking multiple social networks in order to discover the same real world person having different accounts across networks. In particular, we develop a latent model that allow us to jointly characterize the network and linkage structures relying in both relational and profile data. In contrast to other existing approaches in the machine learning literature, our Bayesian implementation naturally provides uncertainty quantification via posterior probabilities for the linkage structure itself or any function of it. Our findings clearly suggest that our methodology can produce accurate point estimates of the linkage structure even in the absence of profile information, and also, in an identity resolution setting, our results confirm that including relational data into the matching process improves the linkage accuracy. We illustrate our methodology using real data from popular social networks such as Twitter, Facebook, and YouTube

arXiv.org e-Print Archive

A Survey on Privacy and Security in Online Social Networks

Author: Iamnitchi Adriana
Kayes Imrul
Publication venue: 'Elsevier BV'
Publication date: 27/01/2015
Field of study

Online Social Networks (OSN) are a permanent presence in today's personal and professional lives of a huge segment of the population, with direct consequences to offline activities. Built on a foundation of trust-users connect to other users with common interests or overlapping personal trajectories-online social networks and the associated applications extract an unprecedented volume of personal information. Unsurprisingly, serious privacy and security risks emerged, positioning themselves along two main types of attacks: attacks that exploit the implicit trust embedded in declared social relationships; and attacks that harvest user's personal information for ill-intended use. This article provides an overview of the privacy and security issues that emerged so far in OSNs. We introduce a taxonomy of privacy and security attacks in OSNs, we overview existing solutions to mitigate those attacks, and outline challenges still to overcome

arXiv.org e-Print Archive

A Survey on the Security of Pervasive Online Social Networks (POSNs)

Author: Choudhary Gaurav
Gupta Takshi
Sharma Vishal
Publication venue
Publication date: 19/06/2018
Field of study

Pervasive Online Social Networks (POSNs) are the extensions of Online Social Networks (OSNs) which facilitate connectivity irrespective of the domain and properties of users. POSNs have been accumulated with the convergence of a plethora of social networking platforms with a motivation of bridging their gap. Over the last decade, OSNs have visually perceived an altogether tremendous amount of advancement in terms of the number of users as well as technology enablers. A single OSN is the property of an organization, which ascertains smooth functioning of its accommodations for providing a quality experience to their users. However, with POSNs, multiple OSNs have coalesced through communities, circles, or only properties, which make service-provisioning tedious and arduous to sustain. Especially, challenges become rigorous when the focus is on the security perspective of cross-platform OSNs, which are an integral part of POSNs. Thus, it is of utmost paramountcy to highlight such a requirement and understand the current situation while discussing the available state-of-the-art. With the modernization of OSNs and convergence towards POSNs, it is compulsory to understand the impact and reach of current solutions for enhancing the security of users as well as associated services. This survey understands this requisite and fixates on different sets of studies presented over the last few years and surveys them for their applicability to POSNs...Comment: 39 Pages, 10 Figure

arXiv.org e-Print Archive

De-Health: All Your Online Health Information Are Belong to Us

Author: Beyah Raheem
Gu Qinchen
He Qinming
Ji Shouling
Liu Qianjun
Wang Ting
Weng Haiqin
Publication venue
Publication date: 03/06/2019
Field of study

In this paper, we study the privacy of online health data. We present a novel online health data De-Anonymization (DA) framework, named De-Health. De-Health consists of two phases: Top-K DA, which identifies a candidate set for each anonymized user, and refined DA, which de-anonymizes an anonymized user to a user in its candidate set. By employing both candidate selection and DA verification schemes, De-Health significantly reduces the DA space by several orders of magnitude while achieving promising DA accuracy. Leveraging two real world online health datasets WebMD (89,393 users, 506K posts) and HealthBoards (388,398 users, 4.7M posts), we validate the efficacy of De-Health. Further, when the training data are insufficient, De-Health can still successfully de-anonymize a large portion of anonymized users. We develop the first analytical framework on the soundness and effectiveness of online health data DA. By analyzing the impact of various data features on the anonymity, we derive the conditions and probabilities for successfully de-anonymizing one user or a group of users in exact DA and Top-K DA. Our analysis is meaningful to both researchers and policy makers in facilitating the development of more effective anonymization techniques and proper privacy polices. We present a linkage attack framework which can link online health/medical information to real world people. Through a proof-of-concept attack, we link 347 out of 2805 WebMD users to real world people, and find the full names, medical/health information, birthdates, phone numbers, and other sensitive information for most of the re-identified users. This clearly illustrates the fragility of the notion of privacy of those who use online health forums

arXiv.org e-Print Archive