53,528 research outputs found
Performance Evaluation of K-Anonymized Data
Data mining provides tools to convert a large amount of knowledge data which is user relevant. But this process could return individual2019;s sensitive information compromising their privacy rights. So, based on different approaches, many privacy protection mechanism incorporated data mining techniques were developed. A widely used micro data protection concept is k-anonymity, proposed to capture the protection of a micro data table regarding re-identification of respondents which the data refers to. In this paper, the effect of the anonymization due to k-anonymity on the data mining classifiers is investigated. NaEF;ve Bayes classifier is used for evaluating the anonymized and non-anonymized data
Abduction and Anonymity in Data Mining
This thesis investigates two new research problems that arise in modern data mining: reasoning on data mining results, and privacy implication of data mining results.
Most of the data mining algorithms rely on inductive techniques, trying to infer information that is generalized from the input data. But very often this inductive step on raw data is not enough to answer the user questions, and there is the need to process data again using other inference methods. In order to answer high level user needs such as explanation of results, we describe an environment able to perform abductive (hypothetical) reasoning, since often the solutions of such queries can be seen as the set of hypothesis that satisfy some requirements. By using cost-based abduction, we show how classification algorithms can be boosted by performing abductive reasoning over the data mining results, improving the quality of the output.
Another growing research area in data mining is the one of privacy-preserving data mining. Due to the availability of large amounts of data, easily collected and stored via computer systems, new applications are emerging, but unfortunately privacy concerns make data mining unsuitable. We study the privacy implications of data mining in a mathematical and logical context, focusing on the anonymity of people whose data are analyzed. A formal theory on anonymity preserving data mining is given, together with a number of anonymity-preserving algorithms for pattern mining.
The post-processing improvement on data mining results (w.r.t. utility and privacy) is the central focus of the problems we investigated in this thesis
Survey on Hybrid Anonymization using k-anonymity for Privacy Preserving in Data Mining
K-anonymity is the one of the popular privacy preserving model. In the data mining there is multiple technique is available k-anonymity is one of the technique which is used for the protecting privacy in the database. In this paper our main approach is hybrid anonymization. The main thing of this technique is that it is the mixing of two techniques. We introduce hybrid anonymization with hybrid generalization which is formed by not only generalization but also the data relocation. Data relocation serves trade-off between truthfulness and utility. Using the hybrid anonymization we maintain the privacy standard such as k-anonymity. In the previous research we find that k-anonymity is not good work with multiple sensitive data and there is more information loss occurs for that issue we use hybrid anonymization on multiple dataset. We show that our model can decrease the information loss in minimum time period
Quantifying Privacy: A Novel Entropy-Based Measure of Disclosure Risk
It is well recognised that data mining and statistical analysis pose a
serious treat to privacy. This is true for financial, medical, criminal and
marketing research. Numerous techniques have been proposed to protect privacy,
including restriction and data modification. Recently proposed privacy models
such as differential privacy and k-anonymity received a lot of attention and
for the latter there are now several improvements of the original scheme, each
removing some security shortcomings of the previous one. However, the challenge
lies in evaluating and comparing privacy provided by various techniques. In
this paper we propose a novel entropy based security measure that can be
applied to any generalisation, restriction or data modification technique. We
use our measure to empirically evaluate and compare a few popular methods,
namely query restriction, sampling and noise addition.Comment: 20 pages, 4 figure
A Clustering-Anonymity Approach for Trajectory Data Publishing Considering both Distance and Direction
Trajectory data contains rich spatio-temporal information of moving objects. Directly publishing it for mining and analysis will result in severe privacy disclosure problems. Most existing clustering-anonymity methods cluster trajectories according to either distance- or direction-based similarities, leading to a high information loss. To bridge this gap, in this paper, we present a clustering-anonymity approach considering both these two types of similarities. As trajectories may not be synchronized, we first design a trajectory synchronization algorithm to synchronize them. Then, two similarity metrics between trajectories are quantitatively defined, followed by a comprehensive one. Furthermore, a clustering-anonymity algorithm for trajectory data publishing with privacy-preserving is proposed. It groups trajectories into clusters according to the comprehensive similarity metric. These clusters are finally anonymized. Experimental results show that our algorithm is effective in preserving privacy with low information loss
Trajectory and Policy Aware Sender Anonymity in Location Based Services
We consider Location-based Service (LBS) settings, where a LBS provider logs
the requests sent by mobile device users over a period of time and later wants
to publish/share these logs. Log sharing can be extremely valuable for
advertising, data mining research and network management, but it poses a
serious threat to the privacy of LBS users. Sender anonymity solutions prevent
a malicious attacker from inferring the interests of LBS users by associating
them with their service requests after gaining access to the anonymized logs.
With the fast-increasing adoption of smartphones and the concern that historic
user trajectories are becoming more accessible, it becomes necessary for any
sender anonymity solution to protect against attackers that are
trajectory-aware (i.e. have access to historic user trajectories) as well as
policy-aware (i.e they know the log anonymization policy). We call such
attackers TP-aware.
This paper introduces a first privacy guarantee against TP-aware attackers,
called TP-aware sender k-anonymity. It turns out that there are many possible
TP-aware anonymizations for the same LBS log, each with a different utility to
the consumer of the anonymized log. The problem of finding the optimal TP-aware
anonymization is investigated. We show that trajectory-awareness renders the
problem computationally harder than the trajectory-unaware variants found in
the literature (NP-complete in the size of the log, versus PTIME). We describe
a PTIME l-approximation algorithm for trajectories of length l and empirically
show that it scales to large LBS logs (up to 2 million users)
ARE YOU WILLING TO WAIT LONGER FOR INTERNET PRIVACY?
It becomes increasingly common for governments, service providers and specialized data aggregators to systematically collect traces of personal communication on the Internet without the user’s knowledge or approval. An analysis of these personal traces by data mining algorithms can reveal sensitive personal information, such as location data, behavioral patterns, or personal profiles including preferences and dislikes. Recent studies show that this information can be used for various purposes, for example by insurance companies or banks to identify potentially risky customers, by governments to observe their citizens, and also by repressive regimes to monitor political opponents. Online anonymity software, such as Tor, can help users to protect their privacy, but often comes at the prize of low usability, e.g., by causing increased latency during surfing. In this exploratory study, we determine factors that influence the usage of Internet anonymity software. In particular, we show that Internet literacy, Internet privacy awareness and Internet privacy concerns are important antecedents for determining an Internet user’s intention to use anonymity software, and that Internet patience has a positive moderating effect on the intention to use anonymity software, as well as on its perceived usefulness
Review on Present State-of-the-Art of Secure and Privacy Preserving Data Mining Techniques
As people of every walk of life are using Internet for various purposes there is growing evidence of proliferation of sensitive information. Security and privacy of data became an important concern. For this reason privacy preserving data mining (PPDM) has been an active research area. PPDM is a process discovering knowledge from voluminous data while protecting sensitive information. In this paper we explore the present state-of-the-art of secure and privacy preserving data mining algorithms or techniques which will help in real world usage of enterprise applications. The techniques discussed include randomized method, k-Anonymity, l-Diversity, t-Closeness, m-Privacy and other PPDM approaches. This paper also focuses on SQL injection attacks and prevention measures. The paper provides research insights into the areas of secure and privacy preserving data mining techniques or algorithms besides presenting gaps in the research that can be used to plan future research
- …