18 research outputs found

    A Fuzzy Logic based Privacy Preservation Clustering method for achieving K- Anonymity using EMD in dLink Model

    Get PDF
    Privacy preservation is the data mining technique which is to be applied on the databases without violating the privacy of individuals. The sensitive attribute can be selected from the numerical data and it can be modified by any data modification technique. After modification, the modified data can be released to any agency. If they can apply data mining techniques such as clustering, classification etc for data analysis, the modified data does not affect the result. In privacy preservation technique, the sensitive data is converted into modified data using S-shaped fuzzy membership function. K-means clustering is applied for both original and modified data to get the clusters. t-closeness requires that the distribution of sensitive attribute in any equivalence class is close to the distribution of the attribute in the overall table. Earth Mover Distance (EMD) is used to measure the distance between the two distributions should be no more than a threshold t. Hence privacy is preserved and accuracy of the data is maintained

    Thirty years of optimization-based SDC methods for tabular data

    Get PDF
    In 1966 Bacharach published in Management Science a work on matrix rounding problems in two-way tables of economic statistics, formulated as a network optimization problem. This is likely the first application of optimization/operations research for statistical disclosure control (SDC) in tabular data. Years later, in 1982, Cox and Ernst used the same approach in a work in INFOR for a similar problem: controlled rounding. And thirty years ago, in 1992, a paper by Kelly, Golden and Assad appeared in Networks about the solution of the cell suppression problem, also using network optimization. Cell suppression was used for years as the main SDC technique for tabular data, and it was an active field of research which resulted in several lines of work and many publications. The above are some of the seminal works on the use of optimization methods for SDC when releasing tabular data. This paper discusses some of the research done this field since then, with a focus on the approaches that were of practical use. It also discusses their pros and cons compared to recent techniques that are not based on optimization methods.Peer ReviewedPostprint (published version

    A Preliminary Study on Methods for Retaining Data Quality Problems in Automatically Generated Test Data

    Get PDF
    Data in an organisation often contains business secrets that organisations do not want to release. However, there are occasions when it is necessary for an organisation to release its data such as when outsourcing work or using the cloud for Data Quality (DQ) related tasks like data cleansing. Currently, there is no mechanism that allows organisations to release their data for DQ tasks while ensuring that it is suitably protected from releasing business related secrets. The aim of this paper is therefore to present our current progress on determining which methods are able to modify secret data and retain DQ problems. So far we have identified the ways in which data swapping and the SHA-2 hash function alterations methods can be used to preserve missing data, incorrectly formatted values, and domain violations DQ problems while minimising the risk of disclosing secrets

    The misty crystal ball: Efficient concealment of privacy-sensitive attributes in predictive analytics

    Get PDF
    Individuals are becoming increasingly concerned with privacy. This curtails their willingness to share sensitive attributes like age, gender or personal preferences; yet firms largely rely upon customer data in any type of predictive analytics. Hence, organizations are confronted with a dilemma in which they need to make a tradeoff between a sparse use of data and the utility from better predictive analytics. This paper proposes a masking mechanism that obscures sensitive attributes while maintaining a large degree of predictive power. More precisely, we efficiently identify data partitions that are best suited for (i) shuffling, (ii) swapping and, as a form of randomization, (iii) perturbing attributes by conditional replacement. By operating on data partitions that are derived from a predictive algorithm, we achieve the objective of masking privacy-sensitive attributes with marginal downsides for predictive modeling. The resulting trade-off between masking and predictive utility is empirically evaluated in the context of customer churn where, for instance, a stratified shuffling of attribute values impedes predictive accuracy rarely by more than a percentage point. Our proposed framework entails direct managerial implications as a growing share of firms adopts predictive analytics and thus requires mechanisms that better adhere to user demands for information privacy


    Get PDF
    统计有培根逻辑做支撑,而数据挖掘则更多地得益于当代信息技术的飞速发展——但是相对于由此产生的大量冗余数据而言,我们似乎并没有获得多少信息。如果说数据挖掘(dATA MInIng,dM)研究的是经过清洗的全样本数据(POPulATIOn),那么统计研究的则是样本


    Get PDF
    THE NEW SAFE LATEX Vystar Corporation announced that Revertex Malaysia has signed a letter of intent to produce Vytex, the new safe latex. Vytex natural rubber latex represents a dramatic breakthrough in the search for a natural rubber latex in which the antigenic proteins that trigger allergic reactions are deactivated without reducing elasticity. Vytex natural rubber latex is produced through a proprietary process that deactivates antigenic proteins in latex, potentially making it safe for use by most people who are allergenic-protein sensitive, the Atlanta-based company said. According to Vystar President William Doyle, the plan is to scale up to production of Vytex natural rubber latex by next year. This would mean that products made of the revolutionary all-natural rubber latex could be reaching the market in 2007. The implication is that healthcare providers and individuals who have avoided natural rubber latex products due to fear of allergenic reactions will now have a safe choice with all of the favorable characteristic of latex, sources were quoted