10,138 research outputs found
User's Privacy in Recommendation Systems Applying Online Social Network Data, A Survey and Taxonomy
Recommender systems have become an integral part of many social networks and
extract knowledge from a user's personal and sensitive data both explicitly,
with the user's knowledge, and implicitly. This trend has created major privacy
concerns as users are mostly unaware of what data and how much data is being
used and how securely it is used. In this context, several works have been done
to address privacy concerns for usage in online social network data and by
recommender systems. This paper surveys the main privacy concerns, measurements
and privacy-preserving techniques used in large-scale online social networks
and recommender systems. It is based on historical works on security,
privacy-preserving, statistical modeling, and datasets to provide an overview
of the technical difficulties and problems associated with privacy preserving
in online social networks.Comment: 26 pages, IET book chapter on big data recommender system
p-probabilistic k-anonymous microaggregation for the anonymization of surveys with uncertain participation
We develop a probabilistic variant of k-anonymous microaggregation which we term p-probabilistic resorting to a statistical model of respondent participation in order to aggregate quasi-identifiers in such a manner that k-anonymity is concordantly enforced with a parametric probabilistic guarantee. Succinctly owing the possibility that some respondents may not finally participate, sufficiently larger cells are created striving to satisfy k-anonymity with probability at least p. The microaggregation function is designed before the respondents submit their confidential data. More precisely, a specification of the function is sent to them which they may verify and apply to their quasi-identifying demographic variables prior to submitting the microaggregated data along with the confidential attributes to an authorized repository.
We propose a number of metrics to assess the performance of our probabilistic approach in terms of anonymity and distortion which we proceed to investigate theoretically in depth and empirically with synthetic and standardized data. We stress that in addition to constituting a functional extension of traditional microaggregation, thereby broadening its applicability to the anonymization of statistical databases in a wide variety of contexts, the relaxation of trust assumptions is arguably expected to have a considerable impact on user acceptance and ultimately on data utility through mere availability.Peer ReviewedPostprint (author's final draft
FACTS: A Framework for Anonymity towards Comparability, Transparency, and Sharing (Extended Version)
Preserving Privacy of High-Dimensional Data by l-Diverse Constrained Slicing
In the modern world of digitalization, data growth, aggregation and sharing have escalated drastically. Users share huge amounts of data due to the widespread adoption of Internet-of-things (IoT) and cloud-based smart devices. Such data could have confidential attributes about various individuals. Therefore, privacy preservation has become an important concern. Many privacy-preserving data publication models have been proposed to ensure data sharing without privacy disclosures. However, publishing high-dimensional data with sufficient privacy is still a challenging task and very little focus has been given to propound optimal privacy solutions for high-dimensional data. In this paper, we propose a novel privacy-preserving model to anonymize high-dimensional data (prone to various privacy attacks including probabilistic, skewness, and gender-specific). Our proposed model is a combination of l-diversity along with constrained slicing and vertical division. The proposed model can protect the above-stated attacks with minimal information loss. The extensive experiments on real-world datasets advocate the outperformance of our proposed model among its counterparts
On the Complexity of -Closeness Anonymization and Related Problems
An important issue in releasing individual data is to protect the sensitive
information from being leaked and maliciously utilized. Famous privacy
preserving principles that aim to ensure both data privacy and data integrity,
such as -anonymity and -diversity, have been extensively studied both
theoretically and empirically. Nonetheless, these widely-adopted principles are
still insufficient to prevent attribute disclosure if the attacker has partial
knowledge about the overall sensitive data distribution. The -closeness
principle has been proposed to fix this, which also has the benefit of
supporting numerical sensitive attributes. However, in contrast to
-anonymity and -diversity, the theoretical aspect of -closeness has
not been well investigated.
We initiate the first systematic theoretical study on the -closeness
principle under the commonly-used attribute suppression model. We prove that
for every constant such that , it is NP-hard to find an optimal
-closeness generalization of a given table. The proof consists of several
reductions each of which works for different values of , which together
cover the full range. To complement this negative result, we also provide exact
and fixed-parameter algorithms. Finally, we answer some open questions
regarding the complexity of -anonymity and -diversity left in the
literature.Comment: An extended abstract to appear in DASFAA 201
Trajectory and Policy Aware Sender Anonymity in Location Based Services
We consider Location-based Service (LBS) settings, where a LBS provider logs
the requests sent by mobile device users over a period of time and later wants
to publish/share these logs. Log sharing can be extremely valuable for
advertising, data mining research and network management, but it poses a
serious threat to the privacy of LBS users. Sender anonymity solutions prevent
a malicious attacker from inferring the interests of LBS users by associating
them with their service requests after gaining access to the anonymized logs.
With the fast-increasing adoption of smartphones and the concern that historic
user trajectories are becoming more accessible, it becomes necessary for any
sender anonymity solution to protect against attackers that are
trajectory-aware (i.e. have access to historic user trajectories) as well as
policy-aware (i.e they know the log anonymization policy). We call such
attackers TP-aware.
This paper introduces a first privacy guarantee against TP-aware attackers,
called TP-aware sender k-anonymity. It turns out that there are many possible
TP-aware anonymizations for the same LBS log, each with a different utility to
the consumer of the anonymized log. The problem of finding the optimal TP-aware
anonymization is investigated. We show that trajectory-awareness renders the
problem computationally harder than the trajectory-unaware variants found in
the literature (NP-complete in the size of the log, versus PTIME). We describe
a PTIME l-approximation algorithm for trajectories of length l and empirically
show that it scales to large LBS logs (up to 2 million users)
- …