53,517 research outputs found

    Performance Evaluation of K-Anonymized Data

    Get PDF
    Data mining provides tools to convert a large amount of knowledge data which is user relevant. But this process could return individual2019;s sensitive information compromising their privacy rights. So, based on different approaches, many privacy protection mechanism incorporated data mining techniques were developed. A widely used micro data protection concept is k-anonymity, proposed to capture the protection of a micro data table regarding re-identification of respondents which the data refers to. In this paper, the effect of the anonymization due to k-anonymity on the data mining classifiers is investigated. NaEF;ve Bayes classifier is used for evaluating the anonymized and non-anonymized data

    Abduction and Anonymity in Data Mining

    Get PDF
    This thesis investigates two new research problems that arise in modern data mining: reasoning on data mining results, and privacy implication of data mining results. Most of the data mining algorithms rely on inductive techniques, trying to infer information that is generalized from the input data. But very often this inductive step on raw data is not enough to answer the user questions, and there is the need to process data again using other inference methods. In order to answer high level user needs such as explanation of results, we describe an environment able to perform abductive (hypothetical) reasoning, since often the solutions of such queries can be seen as the set of hypothesis that satisfy some requirements. By using cost-based abduction, we show how classification algorithms can be boosted by performing abductive reasoning over the data mining results, improving the quality of the output. Another growing research area in data mining is the one of privacy-preserving data mining. Due to the availability of large amounts of data, easily collected and stored via computer systems, new applications are emerging, but unfortunately privacy concerns make data mining unsuitable. We study the privacy implications of data mining in a mathematical and logical context, focusing on the anonymity of people whose data are analyzed. A formal theory on anonymity preserving data mining is given, together with a number of anonymity-preserving algorithms for pattern mining. The post-processing improvement on data mining results (w.r.t. utility and privacy) is the central focus of the problems we investigated in this thesis

    Survey on Hybrid Anonymization using k-anonymity for Privacy Preserving in Data Mining

    Get PDF
    K-anonymity is the one of the popular privacy preserving model. In the data mining there is multiple technique is available k-anonymity is one of the technique which is used for the protecting privacy in the database. In this paper our main approach is hybrid anonymization. The main thing of this technique is that it is the mixing of two techniques. We introduce hybrid anonymization with hybrid generalization which is formed by not only generalization but also the data relocation. Data relocation serves trade-off between truthfulness and utility. Using the hybrid anonymization we maintain the privacy standard such as k-anonymity. In the previous research we find that k-anonymity is not good work with multiple sensitive data and there is more information loss occurs for that issue we use hybrid anonymization on multiple dataset. We show that our model can decrease the information loss in minimum time period

    Quantifying Privacy: A Novel Entropy-Based Measure of Disclosure Risk

    Full text link
    It is well recognised that data mining and statistical analysis pose a serious treat to privacy. This is true for financial, medical, criminal and marketing research. Numerous techniques have been proposed to protect privacy, including restriction and data modification. Recently proposed privacy models such as differential privacy and k-anonymity received a lot of attention and for the latter there are now several improvements of the original scheme, each removing some security shortcomings of the previous one. However, the challenge lies in evaluating and comparing privacy provided by various techniques. In this paper we propose a novel entropy based security measure that can be applied to any generalisation, restriction or data modification technique. We use our measure to empirically evaluate and compare a few popular methods, namely query restriction, sampling and noise addition.Comment: 20 pages, 4 figure

    Trajectory and Policy Aware Sender Anonymity in Location Based Services

    Full text link
    We consider Location-based Service (LBS) settings, where a LBS provider logs the requests sent by mobile device users over a period of time and later wants to publish/share these logs. Log sharing can be extremely valuable for advertising, data mining research and network management, but it poses a serious threat to the privacy of LBS users. Sender anonymity solutions prevent a malicious attacker from inferring the interests of LBS users by associating them with their service requests after gaining access to the anonymized logs. With the fast-increasing adoption of smartphones and the concern that historic user trajectories are becoming more accessible, it becomes necessary for any sender anonymity solution to protect against attackers that are trajectory-aware (i.e. have access to historic user trajectories) as well as policy-aware (i.e they know the log anonymization policy). We call such attackers TP-aware. This paper introduces a first privacy guarantee against TP-aware attackers, called TP-aware sender k-anonymity. It turns out that there are many possible TP-aware anonymizations for the same LBS log, each with a different utility to the consumer of the anonymized log. The problem of finding the optimal TP-aware anonymization is investigated. We show that trajectory-awareness renders the problem computationally harder than the trajectory-unaware variants found in the literature (NP-complete in the size of the log, versus PTIME). We describe a PTIME l-approximation algorithm for trajectories of length l and empirically show that it scales to large LBS logs (up to 2 million users)

    ARE YOU WILLING TO WAIT LONGER FOR INTERNET PRIVACY?

    Get PDF
    It becomes increasingly common for governments, service providers and specialized data aggregators to systematically collect traces of personal communication on the Internet without the user’s knowledge or approval. An analysis of these personal traces by data mining algorithms can reveal sensitive personal information, such as location data, behavioral patterns, or personal profiles including preferences and dislikes. Recent studies show that this information can be used for various purposes, for example by insurance companies or banks to identify potentially risky customers, by governments to observe their citizens, and also by repressive regimes to monitor political opponents. Online anonymity software, such as Tor, can help users to protect their privacy, but often comes at the prize of low usability, e.g., by causing increased latency during surfing. In this exploratory study, we determine factors that influence the usage of Internet anonymity software. In particular, we show that Internet literacy, Internet privacy awareness and Internet privacy concerns are important antecedents for determining an Internet user’s intention to use anonymity software, and that Internet patience has a positive moderating effect on the intention to use anonymity software, as well as on its perceived usefulness

    A Clustering-Anonymity Approach for Trajectory Data Publishing Considering both Distance and Direction

    Get PDF
    Trajectory data contains rich spatio-temporal information of moving objects. Directly publishing it for mining and analysis will result in severe privacy disclosure problems. Most existing clustering-anonymity methods cluster trajectories according to either distance- or direction-based similarities, leading to a high information loss. To bridge this gap, in this paper, we present a clustering-anonymity approach considering both these two types of similarities. As trajectories may not be synchronized, we first design a trajectory synchronization algorithm to synchronize them. Then, two similarity metrics between trajectories are quantitatively defined, followed by a comprehensive one. Furthermore, a clustering-anonymity algorithm for trajectory data publishing with privacy-preserving is proposed. It groups trajectories into clusters according to the comprehensive similarity metric. These clusters are finally anonymized. Experimental results show that our algorithm is effective in preserving privacy with low information loss

    Review on Present State-of-the-Art of Secure and Privacy Preserving Data Mining Techniques

    Get PDF
    As people of every walk of life are using Internet for various purposes there is growing evidence of proliferation of sensitive information. Security and privacy of data became an important concern. For this reason privacy preserving data mining (PPDM) has been an active research area. PPDM is a process discovering knowledge from voluminous data while protecting sensitive information. In this paper we explore the present state-of-the-art of secure and privacy preserving data mining algorithms or techniques which will help in real world usage of enterprise applications. The techniques discussed include randomized method, k-Anonymity, l-Diversity, t-Closeness, m-Privacy and other PPDM approaches. This paper also focuses on SQL injection attacks and prevention measures. The paper provides research insights into the areas of secure and privacy preserving data mining techniques or algorithms besides presenting gaps in the research that can be used to plan future research
    • …
    corecore