5,067 research outputs found

    Going Beyond Obscurity: Organizational Approaches to Data Anonymization

    Get PDF

    Rise of big data – issues and challenges

    Get PDF
    The recent rapid rise in the availability of big data due to Internet-based technologies such as social media platforms and mobile devices has left many market leaders unprepared for handling very large, random and high velocity data. Conventionally, technologies are initially developed and tested in labs and appear to the public through media such as press releases and advertisements. These technologies are then adopted by the general public. In the case of big data technology, fast development and ready acceptance of big data by the user community has left little time to be scrutinized by the academic community. Although many books and electronic media articles are published by professionals and authors for their work on big data, there is still a lack of fundamental work in academic literature. Through survey methods, this paper discusses challenges in different aspects of big data, such as data sources, content format, data staging, data processing, and prevalent data stores. Issues and challenges related to big data, specifically privacy attacks and counter-techniques such as k-anonymity, t-closeness, l-diversity and differential privacy are discussed. Tools and techniques adopted by various organizations to store different types of big data are also highlighted. This study identifies different research areas to address such as a lack of anonymization techniques for unstructured big data, data traffic pattern determination for developing scalable data storage solutions and controlling mechanisms for high velocity data

    Big Data Privacy Context: Literature Effects On Secure Informational Assets

    Get PDF
    This article's objective is the identification of research opportunities in the current big data privacy domain, evaluating literature effects on secure informational assets. Until now, no study has analyzed such relation. Its results can foster science, technologies and businesses. To achieve these objectives, a big data privacy Systematic Literature Review (SLR) is performed on the main scientific peer reviewed journals in Scopus database. Bibliometrics and text mining analysis complement the SLR. This study provides support to big data privacy researchers on: most and least researched themes, research novelty, most cited works and authors, themes evolution through time and many others. In addition, TOPSIS and VIKOR ranks were developed to evaluate literature effects versus informational assets indicators. Secure Internet Servers (SIS) was chosen as decision criteria. Results show that big data privacy literature is strongly focused on computational aspects. However, individuals, societies, organizations and governments face a technological change that has just started to be investigated, with growing concerns on law and regulation aspects. TOPSIS and VIKOR Ranks differed in several positions and the only consistent country between literature and SIS adoption is the United States. Countries in the lowest ranking positions represent future research opportunities.Comment: 21 pages, 9 figure

    Advancing Data Privacy: A Novel K-Anonymity Algorithm with Dissimilarity Tree-Based Clustering and Minimal Information Loss

    Get PDF
    Anonymization serves as a crucial privacy protection technique employed across various technology domains, including cloud storage, machine learning, data mining and big data to safeguard sensitive information from unauthorized third-party access. As the significance and volume of data grow exponentially, comprehensive data protection against all threats is of utmost importance. The main objective of this paper is to provide a brief summary of techniques for data anonymization and differential privacy.A new k-anonymity method, which deviates from conventional k-anonymity approaches, is proposed by us to address privacy protection concerns. Our paper presents a new algorithm designed to achieve k-anonymity through more efficient clustering. The processing of data by most clustering algorithms requires substantial computation. However, by identifying initial centers that align with the data structure, a superior cluster arrangement can be obtained.Our study presents a Dissimilarity Tree-based strategy for selecting optimal starting centroids and generating more accurate clusters with reduced computing time and Normalised Certainty Penalty (NCP). This method also has the added benefit of reducing the Normalised Certainty Penalty (NCP). When compared to other methods, the graphical performance analysis shows that this one reduces the amount of overall information lost in the dataset being anonymized by around 20% on average. In addition, the method that we have designed is capable of properly handling both numerical and category characteristics
    • …
    corecore