1,040 research outputs found

    Assessing Data Usefulness for Failure Analysis in Anonymized System Logs

    Full text link
    System logs are a valuable source of information for the analysis and understanding of systems behavior for the purpose of improving their performance. Such logs contain various types of information, including sensitive information. Information deemed sensitive can either directly be extracted from system log entries by correlation of several log entries, or can be inferred from the combination of the (non-sensitive) information contained within system logs with other logs and/or additional datasets. The analysis of system logs containing sensitive information compromises data privacy. Therefore, various anonymization techniques, such as generalization and suppression have been employed, over the years, by data and computing centers to protect the privacy of their users, their data, and the system as a whole. Privacy-preserving data resulting from anonymization via generalization and suppression may lead to significantly decreased data usefulness, thus, hindering the intended analysis for understanding the system behavior. Maintaining a balance between data usefulness and privacy preservation, therefore, remains an open and important challenge. Irreversible encoding of system logs using collision-resistant hashing algorithms, such as SHAKE-128, is a novel approach previously introduced by the authors to mitigate data privacy concerns. The present work describes a study of the applicability of the encoding approach from earlier work on the system logs of a production high performance computing system. Moreover, a metric is introduced to assess the data usefulness of the anonymized system logs to detect and identify the failures encountered in the system.Comment: 11 pages, 3 figures, submitted to 17th IEEE International Symposium on Parallel and Distributed Computin

    Ontology-Based Quality Evaluation of Value Generalization Hierarchies for Data Anonymization

    Full text link
    In privacy-preserving data publishing, approaches using Value Generalization Hierarchies (VGHs) form an important class of anonymization algorithms. VGHs play a key role in the utility of published datasets as they dictate how the anonymization of the data occurs. For categorical attributes, it is imperative to preserve the semantics of the original data in order to achieve a higher utility. Despite this, semantics have not being formally considered in the specification of VGHs. Moreover, there are no methods that allow the users to assess the quality of their VGH. In this paper, we propose a measurement scheme, based on ontologies, to quantitatively evaluate the quality of VGHs, in terms of semantic consistency and taxonomic organization, with the aim of producing higher-quality anonymizations. We demonstrate, through a case study, how our evaluation scheme can be used to compare the quality of multiple VGHs and can help to identify faulty VGHs.Comment: 18 pages, 7 figures, presented in the Privacy in Statistical Databases Conference 2014 (Ibiza, Spain

    A Comprehensive Bibliometric Analysis on Social Network Anonymization: Current Approaches and Future Directions

    Full text link
    In recent decades, social network anonymization has become a crucial research field due to its pivotal role in preserving users' privacy. However, the high diversity of approaches introduced in relevant studies poses a challenge to gaining a profound understanding of the field. In response to this, the current study presents an exhaustive and well-structured bibliometric analysis of the social network anonymization field. To begin our research, related studies from the period of 2007-2022 were collected from the Scopus Database then pre-processed. Following this, the VOSviewer was used to visualize the network of authors' keywords. Subsequently, extensive statistical and network analyses were performed to identify the most prominent keywords and trending topics. Additionally, the application of co-word analysis through SciMAT and the Alluvial diagram allowed us to explore the themes of social network anonymization and scrutinize their evolution over time. These analyses culminated in an innovative taxonomy of the existing approaches and anticipation of potential trends in this domain. To the best of our knowledge, this is the first bibliometric analysis in the social network anonymization field, which offers a deeper understanding of the current state and an insightful roadmap for future research in this domain.Comment: 73 pages, 28 figure

    Enhanced Privacy Preserving Accesscontrol in Incremental Datausing Microaggregation

    Get PDF
    In microdata releases, main task is to protect the privacy of data subjects. Microaggregation technique use to disclose the limitation at protecting the privacy of microdata. This technique is an alternative to generalization and suppression, which use to generate k-anonymous data sets. In this dataset, identity of each subject is hidden within a group of k subjects. Microaggregation perturbs the data and additional masking allows refining data utility in many ways, like increasing data granularity, to avoid discretization of numerical data, to reduce the impact of outliers. If the variability of the private data values in a group of k subjects is too small, k-anonymity does not provide protection against attribute disclosure. In this work Role based access control is assumed. The access control policies define selection predicates to roles. Then use the concept of imprecision bound for each permission to define a threshold on the amount of imprecision that can be tolerated. So the proposed approach reduces the imprecision for each selection predicate. Anonymization is carried out only for the static relational table in the existing papers. Privacy preserving access control mechanism is applied to the incremental data

    Data Anonymization for Privacy Preservation in Big Data

    Get PDF
    Cloud computing provides capable ascendable IT edifice to provision numerous processing of a various big data applications in sectors such as healthcare and business. Mainly electronic health records data sets and in such applications generally contain privacy-sensitive data. The most popular technique for data privacy preservation is anonymizing the data through generalization. Proposal is to examine the issue against proximity privacy breaches for big data anonymization and try to recognize a scalable solution to this issue. Scalable clustering approach with two phase consisting of clustering algorithm and K-Anonymity scheme with Generalisation and suppression is intended to work on this problem. Design of the algorithms is done with MapReduce to increase high scalability by carrying out dataparallel execution in cloud. Wide-ranging researches on actual data sets substantiate that the method deliberately advances the competence of defensive proximity privacy breaks, the scalability and the efficiency of anonymization over existing methods. Anonymizing data sets through generalization to gratify some of the privacy attributes like k- Anonymity is a popularly-used type of privacy preserving methods. Currently, the gauge of data in numerous cloud surges extremely in agreement with the Big Data, making it a dare for frequently used tools to actually get, manage, and process large-scale data for a particular accepted time scale. Hence, it is a trial for prevailing anonymization approaches to attain privacy conservation for big data private information due to scalabilty issues

    Improved Technique for Preserving Privacy while Mining Real Time Big Data

    Get PDF
    With the evolution of Big data, data owners require the assistance of a third party (e.g.,cloud) to store, analyse the data and obtain information at a lower cost. However, maintaining privacy is a challenge in such scenarios. It may reveal sensitive information. The existing research discusses different techniques to implement privacy in original data using anonymization, randomization, and suppression techniques. But those techniques are not scalable, suffers from information loss, does not support real time data and hence not suitable for privacy preserving big data mining. In this research, a novel approach of two level privacy is proposed using pseudonymization and homomorphic encryption in spark framework. Several simulations are carried out on the collected dataset. Through the results obtained, we observed that execution time is reduced by 50%, privacy is enhanced by 10%. This scheme is suitable for both privacy preserving Big Data publishing and mining
    corecore