8,138 research outputs found

    A Classification of non-Cryptographic Anonymization Techniques Ensuring Privacy in Big Data

    Get PDF
    Recently, Big Data processing becomes crucial to most enterprise and government applications due to the fast growth of the collected data. However, this data often includes private personal information that arise new security and privacy concerns. Moreover, it is widely agreed that the sheer scale of big data makes many privacy preserving techniques unavailing. Therefore, in order to ensure privacy in big data, anonymization is suggested as one of the most efficient approaches. In this paper, we will provide a new detailed classification of the most used non-cryptographic anonymization techniques related to big data including generalization and randomization approaches. Besides, the paper evaluates the presented techniques through integrity, confidentiality and credibility criteria. In addition, three relevant anonymization techniques including k-anonymity, l-diversity and t-closeness are tested on an extract of a huge real data set

    Preservation of Privacy of Big Data Using Efficient Anonymization Technique

    Get PDF
    Big data needs to be retained private because of the increase in the amount of data. Data is generated from social networks, organizations and various other ways, which is known as big data. Big data requires large storage as well as high computational power. At every stage, the data needs to be protected. Privacy preservation plays an important role in keeping sensitive information protected and private from any attack. Data anonymization is one of the techniques to anonymize data to keep it private and protected, which includes suppression, generalization, and bucketization. It keeps personal and private data anonymous from being known by others. But when it is implemented on big data, these techniques cause a great loss of information and also fail in defense of the privacy of big data. Moreover, for the scenario of big data, the anonymization should not only focus on hiding but also on other aspects. This paper aims to provide a technique that uses slicing, suppression, and functional encryption together to achieve better privacy of big data with data anonymization

    Preservation of Privacy of Big Data Using Efficient Anonymization Technique

    Get PDF
    Big data needs to be kept private because of the increase in the amount of data. Data is generated from social networks, organizations and various other ways, which is known as big data. Big data requires large storage as well as high computational power. At every stage, the data needs to be protected. Privacy preservation plays an important role in keeping sensitive information protected and private from any attack. Data anonymization is one of the techniques to anonymize data to keep it private and protected, which includes suppression, generalization, and bucketization. It keeps personal and private data anonymous from being known by others. But when it is implemented on big data, these techniques cause a great loss of information and also fail in defense of the privacy of big data. Moreover, for the scenario of big data, the anonymization should not only focus on hiding but also on other aspects. This paper aims to provide a technique that uses slicing, suppression, and functional encryption together to achieve better privacy of big data with data anonymization

    Data Anonymization for Privacy Preservation in Big Data

    Get PDF
    Cloud computing provides capable ascendable IT edifice to provision numerous processing of a various big data applications in sectors such as healthcare and business. Mainly electronic health records data sets and in such applications generally contain privacy-sensitive data. The most popular technique for data privacy preservation is anonymizing the data through generalization. Proposal is to examine the issue against proximity privacy breaches for big data anonymization and try to recognize a scalable solution to this issue. Scalable clustering approach with two phase consisting of clustering algorithm and K-Anonymity scheme with Generalisation and suppression is intended to work on this problem. Design of the algorithms is done with MapReduce to increase high scalability by carrying out dataparallel execution in cloud. Wide-ranging researches on actual data sets substantiate that the method deliberately advances the competence of defensive proximity privacy breaks, the scalability and the efficiency of anonymization over existing methods. Anonymizing data sets through generalization to gratify some of the privacy attributes like k- Anonymity is a popularly-used type of privacy preserving methods. Currently, the gauge of data in numerous cloud surges extremely in agreement with the Big Data, making it a dare for frequently used tools to actually get, manage, and process large-scale data for a particular accepted time scale. Hence, it is a trial for prevailing anonymization approaches to attain privacy conservation for big data private information due to scalabilty issues

    Improved k-Anonymize and l-Diverse Approach for Privacy Preserving Big Data Publishing Using MPSEC Dataset

    Get PDF
    Data exposure and privacy violations may happen when data is exchanged between organizations. Data anonymization gives promising results for limiting such dangers. In order to maintain privacy, different methods of k-anonymization and l-diversity have been widely used. But for larger datasets, the results are not very promising. The main problem with existing anonymization algorithms is high information loss and high running time. To overcome this problem, this paper proposes new models, namely Improved k-Anonymization (IKA) and Improved l-Diversity (ILD). IKA model takes large k-value using a symmetric as well as an asymmetric anonymizing algorithm. Then IKA is further categorized into Improved Symmetric k-Anonymization (ISKA) and Improved Asymmetric k-Anonymization (IAKA). After anonymizing data using IKA, ILD model is used to increase privacy. ILD will make the data more diverse and thereby increasing privacy. This paper presents the implementation of the proposed IKA and ILD model using real-time big candidate election dataset, which is acquired from the Madhya Pradesh State Election Commission, India (MPSEC) along with Apache Storm. This paper also compares the proposed model with existing algorithms, i.e. Fast clustering-based Anonymization for Data Streams (FADS), Fast Anonymization for Data Stream (FAST), Map Reduce Anonymization (MRA) and Scalable k-Anonymization (SKA). The experimental results show that the proposed models IKA and ILD have remarkable improvement of information loss and significantly enhanced the performance in terms of running time over the existing approaches along with maintaining the privacy-utility trade-off

    Scalable TPTDS Data Anonymization over Cloud using MapReduce

    Get PDF
    With the rapid advancement of big data digital age, large amount data is collected, mined and published. Data publishing become day today routine activity. Cloud computing is best suitable model to support big data applications. Large number of cloud service need users to share microdata like electronic health records, data containing financial transactions so that they can analyze this data. But one of the major issues in moving toward cloud is privacy threats. Data anonymization techniques are widely used to combat with privacy concerns .Anonymizing data sets using generalization to achieve k-anonymity is one of the privacy preserving techniques. Currently, the scale of data in many cloud applications is increasing massively in accordance with the Big Data tendency, thereby making it a difficult for commonly used software tools to capture, handle, manage and process such large-scale datasets. As a result it is challenge for existing approaches for achieving anonymization for large scale data sets due to their inefficiency to support scalability. This paper presents two phase top down specialization approach to anonymize large scale datasets .This approach uses MapReduce framework on cloud, so that it will be highly scalable and efficient. Here we introduce the scheduling mechanism called Optimized Balanced Scheduling to apply the Anonymization. OBS means individual dataset have the separate sensitive field. Every data set consist of sensitive field and give priority for this sensitive field. Then apply Anonymization on this sensitive field only depending upon the scheduling. DOI: 10.17762/ijritcc2321-8169.15077

    Big data, Bigger privacy concern?

    Get PDF
    In light of the rapid growth of big data applications in times where the internet of things is taking over personal privacy, this paper studies the area where data analytics and privacy concerns overlap. Identifying that anonymization and consent frequently do not suffice for user data, this paper also points out the weaknesses of regulations. A survey with 200 respondents showed that the awareness of big data capabilities caused significant privacy concern and willingness for (counter-) action, thus emphasizing that big data-driven firms should take a possible shift in user perception and behavior into account when formulating their strategy

    Data Privacy for Big Data Publishing Using Newly Enhanced PASS Data Mining Mechanism

    Get PDF
    Anonymization is one of the main techniques that is being used in recent times to prevent privacy breaches on the published data; one such anonymization technique is k-anonymization technique. The anonymization is a parametric anonymization technique used for data anonymization. The aim of the k-anonymization is to generalize the tuples in a way that it cannot be identified using quasi-identifiers. In the past few years, we saw a tremendous growth in data that ultimately led to the concept of the big data. The growth in data made anonymization using conventional processing methods inefficient. To make the anonymization more efficient, we used the proposed PASS mechanism in Hadoop framework to reduce the processing time of anonymization. In this work, we have divided the whole program into the map and reduce part. Moreover, the data types used in Hadoop provide better serialization and transport of data. We performed our experiments on the large dataset. The results proved the best efficiency of our implementation

    Strengthening Privacy and Cybersecurity through Anonymization and Big Data

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Improved Technique for Preserving Privacy while Mining Real Time Big Data

    Get PDF
    With the evolution of Big data, data owners require the assistance of a third party (e.g.,cloud) to store, analyse the data and obtain information at a lower cost. However, maintaining privacy is a challenge in such scenarios. It may reveal sensitive information. The existing research discusses different techniques to implement privacy in original data using anonymization, randomization, and suppression techniques. But those techniques are not scalable, suffers from information loss, does not support real time data and hence not suitable for privacy preserving big data mining. In this research, a novel approach of two level privacy is proposed using pseudonymization and homomorphic encryption in spark framework. Several simulations are carried out on the collected dataset. Through the results obtained, we observed that execution time is reduced by 50%, privacy is enhanced by 10%. This scheme is suitable for both privacy preserving Big Data publishing and mining
    • …
    corecore