727 research outputs found
User's Privacy in Recommendation Systems Applying Online Social Network Data, A Survey and Taxonomy
Recommender systems have become an integral part of many social networks and
extract knowledge from a user's personal and sensitive data both explicitly,
with the user's knowledge, and implicitly. This trend has created major privacy
concerns as users are mostly unaware of what data and how much data is being
used and how securely it is used. In this context, several works have been done
to address privacy concerns for usage in online social network data and by
recommender systems. This paper surveys the main privacy concerns, measurements
and privacy-preserving techniques used in large-scale online social networks
and recommender systems. It is based on historical works on security,
privacy-preserving, statistical modeling, and datasets to provide an overview
of the technical difficulties and problems associated with privacy preserving
in online social networks.Comment: 26 pages, IET book chapter on big data recommender system
Location Privacy in Spatial Crowdsourcing
Spatial crowdsourcing (SC) is a new platform that engages individuals in
collecting and analyzing environmental, social and other spatiotemporal
information. With SC, requesters outsource their spatiotemporal tasks to a set
of workers, who will perform the tasks by physically traveling to the tasks'
locations. This chapter identifies privacy threats toward both workers and
requesters during the two main phases of spatial crowdsourcing, tasking and
reporting. Tasking is the process of identifying which tasks should be assigned
to which workers. This process is handled by a spatial crowdsourcing server
(SC-server). The latter phase is reporting, in which workers travel to the
tasks' locations, complete the tasks and upload their reports to the SC-server.
The challenge is to enable effective and efficient tasking as well as reporting
in SC without disclosing the actual locations of workers (at least until they
agree to perform a task) and the tasks themselves (at least to workers who are
not assigned to those tasks). This chapter aims to provide an overview of the
state-of-the-art in protecting users' location privacy in spatial
crowdsourcing. We provide a comparative study of a diverse set of solutions in
terms of task publishing modes (push vs. pull), problem focuses (tasking and
reporting), threats (server, requester and worker), and underlying technical
approaches (from pseudonymity, cloaking, and perturbation to exchange-based and
encryption-based techniques). The strengths and drawbacks of the techniques are
highlighted, leading to a discussion of open problems and future work
Privacy Preserving Data Publishing
Recent years have witnessed increasing interest among researchers in protecting individual privacy in the big data era, involving social media, genomics, and Internet of Things. Recent studies have revealed numerous privacy threats and privacy protection methodologies, that vary across a broad range of applications. To date, however, there exists no powerful methodologies in addressing challenges from: high-dimension data, high-correlation data and powerful attackers.
In this dissertation, two critical problems will be investigated: the prospects and some challenges for elucidating the attack capabilities of attackers in mining individuals’ private information; and methodologies that can be used to protect against such inference attacks, while guaranteeing significant data utility.
First, this dissertation has proposed a series of works regarding inference attacks laying emphasis on protecting against powerful adversaries with auxiliary information. In the context of genomic data, data dimensions and computation feasibility is highly challenging in conducting data analysis. This dissertation proved that the proposed attack can effectively infer the values of the unknown SNPs and traits in linear complexity, which dramatically improve the computation cost compared with traditional methods with exponential computation cost.
Second, putting differential privacy guarantee into high-dimension and high-correlation data remains a challenging problem, due to high-sensitivity, output scalability and signal-to-noise ratio. Consider there are tens-of-millions of genomes in a human DNA, it is infeasible for traditional methods to introduce noise to sanitize genomic data. This dissertation has proposed a series of works and demonstrated that the proposed differentially private method satisfies differential privacy; moreover, data utility is improved compared with the states of the arts by largely lowering data sensitivity.
Third, putting privacy guarantee into social data publishing remains a challenging problem, due to tradeoff requirements between data privacy and utility. This dissertation has proposed a series of works and demonstrated that the proposed methods can effectively realize privacy-utility tradeoff in data publishing.
Finally, two future research topics are proposed. The first topic is about Privacy Preserving Data Collection and Processing for Internet of Things. The second topic is to study Privacy Preserving Big Data Aggregation. They are motivated by the newly proposed data mining, artificial intelligence and cybersecurity methods
Detecting Term Relationships to Improve Textual Document Sanitization
Nowadays, the publication of textual documents provides critical benefits to scientific research and business scenarios where information analysis plays an essential role. Nevertheless, the possible existence of identifying or confidential data in this kind of documents motivates the use of measures to sanitize sensitive information before being published, while keeping the innocuous data unmodified. Several automatic sanitization mechanisms can be found in the literature; however, most of them evaluate the sensitivity of the textual terms considering them as independent variables. At the same time, some authors have shown that there are important information disclosure risks inherent to the existence of relationships between sanitized and non-sanitized terms. Therefore, neglecting term relationships in document sanitization represents a serious privacy threat. In this paper, we present a general-purpose method to automatically detect semantically related terms that may enable disclosure of sensitive data. The foundations of Information Theory and a corpus as large as the Web are used to assess the degree relationship between textual terms according to the amount of information they provide from each other. Preliminary evaluation results show that our proposal significantly improves the detection recall of current sanitization schemes, which reduces the disclosure risk
Exploiting contextual information in attacking set-generalized transactions
Transactions are records that contain a set of items about individuals. For example, items browsed by a customer when shopping online form a transaction. Today, many activities are carried out on the Internet, resulting in a large amount of transaction data being collected. Such data are often shared and analyzed to improve business and services, but they also contain private information about individuals that must be protected. Techniques have been proposed to sanitize transaction data before their release, and set-based generalization is one such method. In this article, we study how well set-based generalization can protect transactions. We propose methods to attack set-generalized transactions by exploiting contextual information that is available within the released data. Our results show that set-based generalization may not provide adequate protection for transactions, and up to 70% of the items added into the transactions during generalization to obfuscate original data can be detected by our methods with a precision over 80%
- …