386 research outputs found

    Securing Hadoop using OAuth 2.0 and Real Time Encryption Algorithm

    Get PDF
    Hadoop is most popularly used distributed programming framework for processing large amount of data with Hadoop distributed file system (HDFS), but processing personal or sensitive data on distributed environment demands secure computing. Originally Hadoop was designed without any security model. Hadoop projects deals with security of data as a top agenda, which in turn to represents classification of a critical data item. The data from various applications such as financial deemed to be sensitive which need to be secured. With the growing acceptance of Hadoop, there is an increasing trend to incorporate more and more enterprise security features. The encryption and decryption technique is used before writing or reading data from HDFS respectively. Advanced Encryption Standard (AES) enables protection of data at each cluster which performs encryption or decryption before read or writes occurs at HDFS. The earlier methods do not provide Data privacy due to the similar mechanism used to provide data security to all users at HDFS and also it increases the file size; so these are not suitable for real-time application. Hadoop require additional terminology to provide unique data security to all users and encrypt data with the compatible speed. We have implemented method in which OAuth does the authentication and provide unique authorization token for each user which is used in encryption technique that provide data privacy for all users of Hadoop. The Real Time encryption algorithms used for securing data in HDFS uses the key that is generated by using authorization token. DOI: 10.17762/ijritcc2321-8169.15071

    Design and Implementation of a Distributed Encryption System for the Cloud

    Get PDF
    ?Big Data ?- voluminous and variety of data from different sources. The Data can be either in the form of structure (or) unstructured Data. Privacy and security of Big Data is gaining high importance, since all the technologies are started to depend on Big Data. It is difficult to work with using most relational database management systems, desktop statistics and visualization packages since it requires massive parallel software running on tens, hundreds or even thousands of servers. In this paper, we are going to discuss the Hadoop and the method for maintaining the privacy & security of big Data. Originally Hadoop was invented without any security model. The main goal is to propose a Hadoop system that maintains privacy and security at the client system. Advanced Encryption Standard (AES) enables protection to data at each cluster, it performs encryption/decryption before read/write respectively and we are using SHA 1 for user authorization

    Private search over big data leveraging distributed file system and parallel processing

    Get PDF
    In this work, we identify the security and privacy problems associated with a certain Big Data application, namely secure keyword-based search over encrypted cloud data and emphasize the actual challenges and technical difficulties in the Big Data setting. More specifically, we provide definitions from which privacy requirements can be derived. In addition, we adapt an existing work on privacy-preserving keyword-based search method to the Big Data setting, in which, not only data is huge but also changing and accumulating very fast. Our proposal is scalable in the sense that it can leverage distributed file systems and parallel programming techniques such as the Hadoop Distributed File System (HDFS) and the MapReduce programming model, to work with very large data sets. We also propose a lazy idf-updating method that can efficiently handle the relevancy scores of the documents in a dynamically changing, large data set. We empirically show the efficiency and accuracy of the method through extensive set of experiments on real data

    Cloud Storage Performance and Security Analysis with Hadoop and GridFTP

    Get PDF
    Even though cloud server has been around for a few years, most of the web hosts today have not converted to cloud yet. If the purpose of the cloud server is distributing and storing files on the internet, FTP servers were much earlier than the cloud. FTP server is sufficient to distribute content on the internet. Therefore, is it worth to shift from FTP server to cloud server? The cloud storage provider declares high durability and availability for their users, and the ability to scale up for more storage space easily could save users tons of money. However, does it provide higher performance and better security features? Hadoop is a very popular platform for cloud computing. It is free software under Apache License. It is written in Java and supports large data processing in a distributed environment. Characteristics of Hadoop include partitioning of data, computing across thousands of hosts, and executing application computations in parallel. Hadoop Distributed File System allows rapid data transfer up to thousands of terabytes, and is capable of operating even in the case of node failure. GridFTP supports high-speed data transfer for wide-area networks. It is based on the FTP and features multiple data channels for parallel transfers. This report describes the technology behind HDFS and enhancement to the Hadoop security features with Kerberos. Based on data transfer performance and security features of HDFS and GridFTP server, we can decide if we should replace GridFTP server with HDFS. According to our experiment result, we conclude that GridFTP server provides better throughput than HDFS, and Kerberos has minimal impact to HDFS performance. We proposed a solution which users authenticate with HDFS first, and get the file from HDFS server to the client using GridFTP

    MapReduce analysis for cloud-archived data

    Get PDF
    Public storage clouds have become a popular choice for archiving certain classes of enterprise data - for example, application and infrastructure logs. These logs contain sensitive information like IP addresses or user logins due to which regulatory and security requirements often require data to be encrypted before moved to the cloud. In order to leverage such data for any business value, analytics systems (e.g. Hadoop/MapReduce) first download data from these public clouds, decrypt it and then process it at the secure enterprise site. We propose VNCache: an efficient solution for MapReduceanalysis of such cloud-archived log data without requiring an apriori data transfer and loading into the local Hadoop cluster. VNcache dynamically integrates cloud-archived data into a virtual namespace at the enterprise Hadoop cluster. Through a seamless data streaming and prefetching model, Hadoop jobs can begin execution as soon as they are launched without requiring any apriori downloading. With VNcache's accurate pre-fetching and caching, jobs often run on a local cached copy of the data block significantly improving performance. When no longer needed, data is safely evicted from the enterprise cluster reducing the total storage footprint. Uniquely, VNcache is implemented with NO changes to the Hadoop application stack. © 2014 IEEE

    PPS-ADS: A Framework for Privacy-Preserved and Secured Distributed System Architecture for Handling Big Data

    Get PDF
    The exponential expansion of Big Data in 7V’s (velocity, variety, veracity, value, variability and visualization) brings forth new challenges to security, reliability, availability and privacy of these data sets. Traditional security techniques and algorithms fail to complement this gigantic big data. This paper aims to improve the recently proposed Atrain Distributed System (ADS) by incorporating new features which will cater to the end-to-end availability and security aspects of the big data in the distributed system. The paper also integrates the concept of Software Defined Networking (SDN) in ADS to effectively control and manage the routing of the data item in the ADS. The storage of data items in the ADS is done on the basis of the type of data (structured or unstructured), the capacity of the distributed system (or coach) and the distance of coach from the pilot computer (PC). In order to maintain the consistency of data and to eradicate the possible loss of data, the concept of “forward positive” and “backward positive” acknowledgment is proposed. Furthermore, we have incorporated “Twofish” cryptographic technique to encrypt the big data in the ADS. Issues like “data ownership”, “data security, “data privacy” and data reliability” are pivotal while handling the big data. The current paper presents a framework for a privacy-preserved architecture for handling the big data in an effective manner
    • …
    corecore