Search CORE

3,850 research outputs found

Improved Technique for Preserving Privacy while Mining Real Time Big Data

Author: Chandrakar Ila
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 15/04/2022
Field of study

With the evolution of Big data, data owners require the assistance of a third party (e.g.,cloud) to store, analyse the data and obtain information at a lower cost. However, maintaining privacy is a challenge in such scenarios. It may reveal sensitive information. The existing research discusses different techniques to implement privacy in original data using anonymization, randomization, and suppression techniques. But those techniques are not scalable, suffers from information loss, does not support real time data and hence not suitable for privacy preserving big data mining. In this research, a novel approach of two level privacy is proposed using pseudonymization and homomorphic encryption in spark framework. Several simulations are carried out on the collected dataset. Through the results obtained, we observed that execution time is reduced by 50%, privacy is enhanced by 10%. This scheme is suitable for both privacy preserving Big Data publishing and mining

International Journal of Communication Networks and Information Security (IJCNIS)

Balancing between data utility and privacy preservation in data mining

Author: Jain Sachin kumar
Tandon Ankit
Publication venue
Publication date: 12/05/2010
Field of study

Data Mining plays a vital role in today‟s information world where it has been widely applied in various organizations. The current trend needs to share data for mutual benefit. However, there has been a lot of concern over privacy in the recent years .It has also raised a potential threat of revealing sensitive data of an individual when the data is released publically. Various methods have been proposed to tackle the privacy preservation problem like anonymization and perturbation. But the natural consequence of privacy preservation is information loss. The loss of specific information about certain individuals may affect the data quality and in extreme case the data may become completely useless. There are methods like cryptography which completely anonymize the dataset and which renders the dataset useless. So the utility of the data is completely lost. We need to protect the private information and preserve the data utility as much as possible. So the objective of the thesis is to find an optimum balance between privacy and utility while publishing dataset of any organization. Privacy preservation is hard requirement that must be satisfied and utility is the measure to be optimized. One of the methods for preserving privacy is K-anonymization which also preserves privacy to a good extent. K-anonymity demands that every tuple in the dataset released be indistinguishably related to no fewer than k respondents. We used K-means algorithm for clustering the dataset and followed by k-anonymization. Decision stump classification is used to determine utility and privacy is determined by firing random queries on the anonymized dataset. The balancing point is where the utility and privacy curves intersect or they tend to converge. The balancing point will vary from dataset to dataset and the choice of Quasi-identifier and sensitive attribute. For our experiment the balancing point is found to be around 50-60 percent which is the intersecting point of privacy and utility curves

ethesis@nitr

Complementing privacy and utility trade-off with self-organising maps

Author: Ayesh Aladdin
Boiten Eerke
Mohammed Kabiru
Publication venue: 'MDPI AG'
Publication date: 01/09/2021
Field of study

This research received no external funding.Peer reviewedPublisher PD

Aberdeen University Research

Peer to Peer Information Retrieval: An Overview

Author: Hiemstra Djoerd
Tigelaar Almer S.
Trieschnigg Dolf
Publication venue: ACM
Publication date: 01/01/2012
Field of study

Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom

Radboud Repository

University of Twente Research Information

Exploiting Record Similarity for Practical Vertical Federated Learning

Author: He Bingsheng
Li Qinbin
Wu Zhaomin
Publication venue
Publication date: 11/06/2021
Field of study

As the privacy of machine learning has drawn increasing attention, federated learning is introduced to enable collaborative learning without revealing raw data. Notably, \textit{vertical federated learning} (VFL), where parties share the same set of samples but only hold partial features, has a wide range of real-world applications. However, existing studies in VFL rarely study the ``record linkage'' process. They either design algorithms assuming the data from different parties have been linked or use simple linkage methods like exact-linkage or top1-linkage. These approaches are unsuitable for many applications, such as the GPS location and noisy titles requiring fuzzy matching. In this paper, we design a novel similarity-based VFL framework, FedSim, which is suitable for more real-world applications and achieves higher performance on traditional VFL tasks. Moreover, we theoretically analyze the privacy risk caused by sharing similarities. Our experiments on three synthetic datasets and five real-world datasets with various similarity metrics show that FedSim consistently outperforms other state-of-the-art baselines

arXiv.org e-Print Archive