15 research outputs found

    Toward efficient and secure public auditing for dynamic big data storage on cloud

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Cloud and Big Data are two of the most attractive ICT research topics that have emerged in recent years. Requirements of big data processing are now everywhere, while the pay-as-you-go model of cloud systems is especially cost efficient in terms of processing big data applications. However, there are still concerns that hinder the proliferation of cloud, and data security/privacy is a top concern for data owners wishing to migrate their applications into the cloud environment. Compared to users of conventional systems, cloud users need to surrender the local control of their data to cloud servers. Another challenge for big data is the data dynamism which exists in most big data applications. Due to the frequent updates, efficiency becomes a major issue in data management. As security always brings compromises in efficiency, it is difficult but nonetheless important to investigate how to efficiently address security challenges over dynamic cloud data. Data integrity is an essential aspect of data security. Except for server-side integrity protection mechanisms, verification from a third-party auditor is of equal importance because this enables users to verify the integrity of their data through the auditors at any user-chosen timeslot. This type of verification is also named 'public auditing' of data. Existing public auditing schemes allow the integrity of a dataset stored in cloud to be externally verified without retrieval of the whole original dataset. However, in practice, there are many challenges that hinder the application of such schemes. To name a few of these, first, the server still has to aggregate a proof with the cloud controller from data blocks that are distributedly stored and processed on cloud instances and this means that encryption and transfer of these data within the cloud will become time-consuming. Second, security flaws exist in the current designs. The verification processes are insecure against various attacks and this leads to concerns about deploying these schemes in practice. Third, when the dataset is large, auditing of dynamic data becomes costly in terms of communication and storage. This is especially the case for a large number of small data updates and data updates on multi-replica cloud data storage. In this thesis, the research problem of dynamic public data auditing in cloud is systematically investigated. After analysing the research problems, we systematically address the problems regarding secure and efficient public auditing of dynamic big data in cloud by developing, testing and publishing a series of security schemes and algorithms for secure and efficient public auditing of dynamic big data storage on cloud. Specifically, our work focuses on the following aspects: cloud internal authenticated key exchange, authorisation on third-party auditor, fine-grained update support, index verification, and efficient multi-replica public auditing of dynamic data. To the best of our knowledge, this thesis presents the first series of work to systematically analysis and to address this research problem. Experimental results and analyses show that the solutions that are presented in this thesis are suitable for auditing dynamic big data storage on cloud. Furthermore, our solutions represent significant improvements in cloud efficiency and security

    Scalable TPTDS Data Anonymization over Cloud using MapReduce

    Get PDF
    With the rapid advancement of big data digital age, large amount data is collected, mined and published. Data publishing become day today routine activity. Cloud computing is best suitable model to support big data applications. Large number of cloud service need users to share microdata like electronic health records, data containing financial transactions so that they can analyze this data. But one of the major issues in moving toward cloud is privacy threats. Data anonymization techniques are widely used to combat with privacy concerns .Anonymizing data sets using generalization to achieve k-anonymity is one of the privacy preserving techniques. Currently, the scale of data in many cloud applications is increasing massively in accordance with the Big Data tendency, thereby making it a difficult for commonly used software tools to capture, handle, manage and process such large-scale datasets. As a result it is challenge for existing approaches for achieving anonymization for large scale data sets due to their inefficiency to support scalability. This paper presents two phase top down specialization approach to anonymize large scale datasets .This approach uses MapReduce framework on cloud, so that it will be highly scalable and efficient. Here we introduce the scheduling mechanism called Optimized Balanced Scheduling to apply the Anonymization. OBS means individual dataset have the separate sensitive field. Every data set consist of sensitive field and give priority for this sensitive field. Then apply Anonymization on this sensitive field only depending upon the scheduling. DOI: 10.17762/ijritcc2321-8169.15077

    Privacy Preserving by Anonymization Approach

    Get PDF
    Privacy Preserving takes more attention in data mining because now a days people registers every days on so many sites and gives their personal details like DOB, Zip code, etc. and from that any attacker can get sensitive data of individual person so here privacy is breached due to this problem Randomization, Anonymization, Cryptography, Partition, other many approaches are introduced from them each approach have their own limitations and Anonymization have Information Loss and In some case privacy also breached so for that this new approach introduce which will decrease Information Loss and increase privacy

    Privacy Preservation in Analyzing E-Health Records in Big Data Environment

    Get PDF
    Increased use of the Internet and progress in Cloud computing creates a large new datasets with increasing value to business. Data need to be processed by cloud applications are emerging much faster than the computing power. Hadoop-MapReduce has become powerful computation model to address these problems. Nowadays many cloud services require users to share their confidential data like electronic health records for research analysis or data mining, which brings privacy concerns. K-anonymity is one of the widely used privacy model. The scale of data in cloud applications rises extremely in agreement with the Big Data tendency, thereby creating it a dispute for conventional software tools to process such large scale data within an endurable lapsed time. As a consequence, it is a dispute for current anonymization techniques to preserve privacy on confidential extensible data sets due to their inadequacy of scalability. In this project, we propose an extensible two-phase approach to anonymize scalable data sets using dynamic MapReduce framework, Top Down Specialization (TDS) Algorithm and k-Anonymity privacy model. The resources are optimized via three key aspects. First, the under-utilization of map and reduce tasks is improved based on Dynamic Hadoop Slot Allocation (DHSA). Second, the performance tradeoff between the single job and a batch of jobs is balanced using the Speculative Execution Performance Balancing (SEPB). Third, data locality can be improved without any impact on fairness using Slot Pre Scheduling. Experimental evaluation results demonstrate that with this project, the scalability, efficiency and privacy of data sets can be significantly improved over existing approaches. DOI: 10.17762/ijritcc2321-8169.160413

    BIG DATA ANALYTICS - AN OVERVIEW

    Get PDF
       Big Data Analytics has been in advance more attention recently since researchers in business and academic world are trying to successfully mine and use all possible knowledge from the vast amount of data generated and obtained. Demanding a paradigm shift in the storage, processing and analysis of Big Data, traditional data analysis methods stumble upon large amounts of data in a short period of time. Because of its importance, the U.S. Many agencies, including the government, have in recent years released large funds for research in Big Data and related fields. This gives a concise summary of investigate growth in various areas related to big data processing and analysis and terminate with a discussion of research guidelines in the similar areas. &nbsp

    Private search over big data leveraging distributed file system and parallel processing

    Get PDF
    In this work, we identify the security and privacy problems associated with a certain Big Data application, namely secure keyword-based search over encrypted cloud data and emphasize the actual challenges and technical difficulties in the Big Data setting. More specifically, we provide definitions from which privacy requirements can be derived. In addition, we adapt an existing work on privacy-preserving keyword-based search method to the Big Data setting, in which, not only data is huge but also changing and accumulating very fast. Our proposal is scalable in the sense that it can leverage distributed file systems and parallel programming techniques such as the Hadoop Distributed File System (HDFS) and the MapReduce programming model, to work with very large data sets. We also propose a lazy idf-updating method that can efficiently handle the relevancy scores of the documents in a dynamically changing, large data set. We empirically show the efficiency and accuracy of the method through extensive set of experiments on real data

    Data Anonymization for Privacy Preservation in Big Data

    Get PDF
    Cloud computing provides capable ascendable IT edifice to provision numerous processing of a various big data applications in sectors such as healthcare and business. Mainly electronic health records data sets and in such applications generally contain privacy-sensitive data. The most popular technique for data privacy preservation is anonymizing the data through generalization. Proposal is to examine the issue against proximity privacy breaches for big data anonymization and try to recognize a scalable solution to this issue. Scalable clustering approach with two phase consisting of clustering algorithm and K-Anonymity scheme with Generalisation and suppression is intended to work on this problem. Design of the algorithms is done with MapReduce to increase high scalability by carrying out dataparallel execution in cloud. Wide-ranging researches on actual data sets substantiate that the method deliberately advances the competence of defensive proximity privacy breaks, the scalability and the efficiency of anonymization over existing methods. Anonymizing data sets through generalization to gratify some of the privacy attributes like k- Anonymity is a popularly-used type of privacy preserving methods. Currently, the gauge of data in numerous cloud surges extremely in agreement with the Big Data, making it a dare for frequently used tools to actually get, manage, and process large-scale data for a particular accepted time scale. Hence, it is a trial for prevailing anonymization approaches to attain privacy conservation for big data private information due to scalabilty issues

    Frame Interpolation for Cloud-Based Mobile Video Streaming

    Full text link
    © 2016 IEEE. Cloud-based High Definition (HD) video streaming is becoming popular day by day. On one hand, it is important for both end users and large storage servers to store their huge amount of data at different locations and servers. On the other hand, it is becoming a big challenge for network service providers to provide reliable connectivity to the network users. There have been many studies over cloud-based video streaming for Quality of Experience (QoE) for services like YouTube. Packet losses and bit errors are very common in transmission networks, which affect the user feedback over cloud-based media services. To cover up packet losses and bit errors, Error Concealment (EC) techniques are usually applied at the decoder/receiver side to estimate the lost information. This paper proposes a time-efficient and quality-oriented EC method. The proposed method considers H.265/HEVC based intra-encoded videos for the estimation of whole intra-frame loss. The main emphasis in the proposed approach is the recovery of Motion Vectors (MVs) of a lost frame in real-time. To boost-up the search process for the lost MVs, a bigger block size and searching in parallel are both considered. The simulation results clearly show that our proposed method outperforms the traditional Block Matching Algorithm (BMA) by approximately 2.5 dB and Frame Copy (FC) by up to 12 dB at a packet loss rate of 1%, 3%, and 5% with different Quantization Parameters (QPs). The computational time of the proposed approach outperforms the BMA by approximately 1788 seconds

    State-of-the-Art in Data Integrity and Privacy-Preserving in Cloud Computing

    Get PDF
    Cloud computing (CC) is a fast-growing technology that offers computers, networking, and storage services that can be accessed and used over the internet. Cloud services save users money because they are pay-per-use, and they save time because they are on-demand and elastic, a unique aspect of cloud computing. However, several security issues must be addressed before users store data in the cloud. Because the user will have no direct control over the data that has been outsourced to the cloud, particularly personal and sensitive data (health, finance, military, etc.), and will not know where the data is stored, the user must ensure that the cloud stores and maintains the outsourced data appropriately. The study's primary goals are to make the cloud and data security challenges more understandable, to briefly explain the techniques used to achieve privacy and data integrity, to compare various recent studies in both pre-quantum and post-quantum, and to focus on current gaps in solving privacy and data integrity issues

    Private search over big data leveraging distributed file system and parallel processing

    Get PDF
    As the new technologies recently became widespread, enormous amount of data started to be generated in very high speeds and stored in untrusted servers. The big data concept covers not only the exceptional size of the datasets, but also high data generation rate and large variety of data types. Although the Big Data provides very tempting benefits, the security issues are still an open problem. In this thesis, we identify security and privacy problems associated with a certain big data application, namely secure keyword-based search over encrypted cloud data and emphasize the actual challenges and technical difficulties in the big data setting. More specifically, we provide definitions from which privacy requirements can be derived. In addition, we adapt an existing work on privacy-preserving keyword-based search method, which is one of the fundamental operations that can be performed over encrypted data, to the big data setting, in which, not only data is huge but also changing and accumulating very fast. Therefore, in the big data setting, a secure index that allows search over encrypted data should be constructed and updated very fast in addition to an efficient and effective keyword-based search operation method. Our proposal is scalable in the sense that it can leverage distributed file systems and parallel programming techniques such as the Hadoop Distributed File System (HDFS) and the MapReduce programming model to work with very large datasets. We also propose a lazy idf-updating method that can efficiently handle the relevancy scores of the documents in dynamically changing and large datasets. We empirically show the efficiency and accuracy of the method through extensive set of experiments on real dat
    corecore