Search CORE

8,744 research outputs found

Augmented Rotation-Based Transformation for Privacy-Preserving Data Clustering

Author: Hong Dowon
Mohaisen Abedelaziz
Publication venue
Publication date: 10/06/2010
Field of study

Multiple rotation-based transformation (MRBT) was introduced recently for mitigating the apriori-knowledge independent component analysis (AK-ICA) attack on rotation-based transformation (RBT), which is used for privacy-preserving data clustering. MRBT is shown to mitigate the AK-ICA attack but at the expense of data utility by not enabling conventional clustering. In this paper, we extend the MRBT scheme and introduce an augmented rotation-based transformation (ARBT) scheme that utilizes linearity of transformation and that both mitigates the AK-ICA attack and enables conventional clustering on data subsets transformed using the MRBT. In order to demonstrate the computational feasibility aspect of ARBT along with RBT and MRBT, we develop a toolkit and use it to empirically compare the different schemes of privacy-preserving data clustering based on data transformation in terms of their overhead and privacy.Comment: 11 pages, 11 figures, and 6 table

arXiv.org e-Print Archive

Privacy in Social Media: Identification, Mitigation and Applications

Author: Beigi Ghazaleh
Liu Huan
Publication venue
Publication date: 06/08/2018
Field of study

The increasing popularity of social media has attracted a huge number of people to participate in numerous activities on a daily basis. This results in tremendous amounts of rich user-generated data. This data provides opportunities for researchers and service providers to study and better understand users' behaviors and further improve the quality of the personalized services. Publishing user-generated data risks exposing individuals' privacy. Users privacy in social media is an emerging task and has attracted increasing attention in recent years. These works study privacy issues in social media from the two different points of views: identification of vulnerabilities, and mitigation of privacy risks. Recent research has shown the vulnerability of user-generated data against the two general types of attacks, identity disclosure and attribute disclosure. These privacy issues mandate social media data publishers to protect users' privacy by sanitizing user-generated data before publishing it. Consequently, various protection techniques have been proposed to anonymize user-generated social media data. There is a vast literature on privacy of users in social media from many perspectives. In this survey, we review the key achievements of user privacy in social media. In particular, we review and compare the state-of-the-art algorithms in terms of the privacy leakage attacks and anonymization algorithms. We overview the privacy risks from different aspects of social media and categorize the relevant works into five groups 1) graph data anonymization and de-anonymization, 2) author identification, 3) profile attribute disclosure, 4) user location and privacy, and 5) recommender systems and privacy issues. We also discuss open problems and future research directions for user privacy issues in social media.Comment: This survey is currently under revie

arXiv.org e-Print Archive

A Novel Framework using Elliptic Curve Cryptography for Extremely Secure Transmission in Distributed Privacy Preserving Data Mining

Author: Kavya N. P.
Kiran P.
Kumar S Sathish
Publication venue
Publication date: 11/04/2012
Field of study

Privacy Preserving Data Mining is a method which ensures privacy of individual information during mining. Most important task involves retrieving information from multiple data bases which is distributed. The data once in the data warehouse can be used by mining algorithms to retrieve confidential information. The proposed framework has two major tasks, secure transmission and privacy of confidential information during mining. Secure transmission is handled by using elliptic curve cryptography and data distortion for privacy preservation ensuring highly secure environment.Comment: 8 page

arXiv.org e-Print Archive

A Unified Framework for Clustering Constrained Data without Locality Property

Author: Ding Hu
Xu Jinhui
Publication venue
Publication date: 01/10/2018
Field of study

In this paper, we consider a class of constrained clustering problems of points in

\mathbb{R}^{d}

, where

d

could be rather high. A common feature of these problems is that their optimal clusterings no longer have the locality property (due to the additional constraints), which is a key property required by many algorithms for their unconstrained counterparts. To overcome the difficulty caused by the loss of locality, we present in this paper a unified framework, called {\em Peeling-and-Enclosing (PnE)}, to iteratively solve two variants of the constrained clustering problems, {\em constrained

k

-means clustering} (

k

-CMeans) and {\em constrained

k

-median clustering} (

k

-CMedian). Our framework is based on two standalone geometric techniques, called {\em Simplex Lemma} and {\em Weaker Simplex Lemma}, for

k

-CMeans and

k

-CMedian, respectively. The simplex lemma (or weaker simplex lemma) enables us to efficiently approximate the mean (or median) point of an unknown set of points by searching a small-size grid, independent of the dimensionality of the space, in a simplex (or the surrounding region of a simplex), and thus can be used to handle high dimensional data. If

k

and

\frac{1}{\epsilon}

are fixed numbers, our framework generates, in nearly linear time ({\em i.e.,}

O(n(\log n)^{k+1}d)

O((\log n)^{k})

k

-tuple candidates for the

k

mean or median points, and one of them induces a

(1+\epsilon)

-approximation for

k

-CMeans or

k

-CMedian, where

n

is the number of points. Combining this unified framework with a problem-specific selection algorithm (which determines the best

k

-tuple candidate), we obtain a

(1+\epsilon)

-approximation for each of the constrained clustering problems. We expect that our technique will be applicable to other constrained clustering problems without locality

arXiv.org e-Print Archive

Secure Mining of Association Rules in Horizontally Distributed Databases

Author: Tassa Tamir
Publication venue
Publication date: 25/06/2011
Field of study

We propose a protocol for secure mining of association rules in horizontally distributed databases. The current leading protocol is that of Kantarcioglu and Clifton (TKDE 2004). Our protocol, like theirs, is based on the Fast Distributed Mining (FDM) algorithm of Cheung et al. (PDIS 1996), which is an unsecured distributed version of the Apriori algorithm. The main ingredients in our protocol are two novel secure multi-party algorithms --- one that computes the union of private subsets that each of the interacting players hold, and another that tests the inclusion of an element held by one player in a subset held by another. Our protocol offers enhanced privacy with respect to the protocol of Kantarcioglu and Clifton. In addition, it is simpler and is significantly more efficient in terms of communication rounds, communication cost and computational cost

arXiv.org e-Print Archive

A Survey on Software-Defined VANETs: Benefits, Challenges, and Future Directions

Author: Conti Mauro
Jaballah Wafa Ben
Lal Chhagan
Publication venue
Publication date: 21/05/2019
Field of study

The evolving of Fifth Generation (5G) networks isbecoming more readily available as a major driver of the growthof new applications and business models. Vehicular Ad hocNetworks (VANETs) and Software Defined Networking (SDN)represent the key enablers of 5G technology with the developmentof next generation intelligent vehicular networks and applica-tions. In recent years, researchers have focused on the integrationof SDN and VANET, and look at different topics related to thearchitecture, the benefits of software-defined VANET servicesand the new functionalities to adapt them. However, securityand robustness of the complete architecture is still questionableand have been largely negleted. Moreover, the deployment andintegration of novel entities and several architectural componentsdrive new security threats and vulnerabilities.In this paper, first we survey the state-of-the-art SDN basedVehicular ad-hoc Network (SDVN) architectures for their net-working infrastructure design, functionalities, benefits, and chal-lenges. Then we discuss these SDVN architectures against majorsecurity threats that violate the key security services such asavailability, confidentiality, authentication, and data integrity.We also propose different countermeasures to these threats.Finally, we discuss the lessons learned with the directions offuture research work towards provisioning stringent security andprivacy solutions in future SDVN architectures. To the best of ourknowledge, this is the first comprehensive work that presents sucha survey and analysis on SDVNs in the era of future generationnetworks (e.g., 5G, and Information centric networking) andapplications (e.g., intelligent transportation system, and IoT-enabled advertising in VANETs).Comment: 17 pages, 2 figure

arXiv.org e-Print Archive

Holistic Collaborative Privacy Framework for Users' Privacy in Social Recommender Service

Author: Botvich Dmitri
Elmisery Ahmed M.
Rho Seungmin
Publication venue
Publication date: 13/11/2014
Field of study

The current business model for existing recommender services is centered around the availability of users' personal data at their side whereas consumers have to trust that the recommender service providers will not use their data in a malicious way. With the increasing number of cases for privacy breaches, different countries and corporations have issued privacy laws and regulations to define the best practices for the protection of personal information. The data protection directive 95/46/EC and the privacy principles established by the Organization for Economic Cooperation and Development (OECD) are examples of such regulation frameworks. In this paper, we assert that utilizing third-party recommender services to generate accurate referrals are feasible, while preserving the privacy of the users' sensitive information which will be residing on a clear form only on his/her own device. As a result, each user who benefits from the third-party recommender service will have absolute control over what to release from his/her own preferences. We proposed a collaborative privacy middleware that executes a two stage concealment process within a distributed data collection protocol in order to attain this claim. Additionally, the proposed solution complies with one of the common privacy regulation frameworks for fair information practice in a natural and functional way -which is OECD privacy principles. The approach presented in this paper is easily integrated into the current business model as it is implemented using a middleware that runs at the end-users side and utilizes the social nature of content distribution services to implement a topological data collection protocol

arXiv.org e-Print Archive

Parallel and Distributed Collaborative Filtering: A Survey

Author: Karydi Efthalia
Margaritis Konstantinos G.
Publication venue
Publication date: 09/09/2014
Field of study

Collaborative filtering is amongst the most preferred techniques when implementing recommender systems. Recently, great interest has turned towards parallel and distributed implementations of collaborative filtering algorithms. This work is a survey of the parallel and distributed collaborative filtering implementations, aiming not only to provide a comprehensive presentation of the field's development, but also to offer future research orientation by highlighting the issues that need to be further developed.Comment: 46 page

arXiv.org e-Print Archive

Security and Privacy Issues in Deep Learning

Author: Bae Ho
Ha Heonseok
Jang Hyemi
Jang Jaehee
Jung Dahuin
Yoon Sungroh
Publication venue
Publication date: 23/11/2019
Field of study

With the development of machine learning (ML), expectations for artificial intelligence (AI) technology have been increasing daily. In particular, deep neural networks have shown outstanding performance results in many fields. Many applications are deeply involved in our daily life, such as making significant decisions in application areas based on predictions or classifications, in which a DL model could be relevant. Hence, if a DL model causes mispredictions or misclassifications due to malicious external influences, then it can cause very large difficulties in real life. Moreover, training DL models involve an enormous amount of data and the training data often include sensitive information. Therefore, DL models should not expose the privacy of such data. In this paper, we review the vulnerabilities and the developed defense methods on the security of the models and data privacy under the notion of secure and private AI (SPAI). We also discuss current challenges and open issues

arXiv.org e-Print Archive

Scalable attribute-aware network embedding with locality

Author: Hu Guangmin
Liu Weiyi
Liu Zhining
Suzumura Toyotaro
Publication venue
Publication date: 29/04/2018
Field of study

Adding attributes for nodes to network embedding helps to improve the ability of the learned joint representation to depict features from topology and attributes simultaneously. Recent research on the joint embedding has exhibited a promising performance on a variety of tasks by jointly embedding the two spaces. However, due to the indispensable requirement of globality based information, present approaches contain a flaw of in-scalability. Here we propose \emph{SANE}, a scalable attribute-aware network embedding algorithm with locality, to learn the joint representation from topology and attributes. By enforcing the alignment of a local linear relationship between each node and its K-nearest neighbors in topology and attribute space, the joint embedding representations are more informative comparing with a single representation from topology or attributes alone. And we argue that the locality in \emph{SANE} is the key to learning the joint representation at scale. By using several real-world networks from diverse domains, We demonstrate the efficacy of \emph{SANE} in performance and scalability aspect. Overall, for performance on label classification, SANE successfully reaches up to the highest F1-score on most datasets, and even closer to the baseline method that needs label information as extra inputs, compared with other state-of-the-art joint representation algorithms. What's more, \emph{SANE} has an up to 71.4\% performance gain compared with the single topology-based algorithm. For scalability, we have demonstrated the linearly time complexity of \emph{SANE}. In addition, we intuitively observe that when the network size scales to 100,000 nodes, the "learning joint embedding" step of \emph{SANE} only takes

\approx10

seconds

arXiv.org e-Print Archive