211 research outputs found

    HadoopSec: Sensitivity-aware Secure Data Placement Strategy for Big Data/Hadoop Platform using Prescriptive Analytics

    Get PDF
    Hadoop has become one of the key player in offeringdata analytics and data processing support for any organizationthat handles different shades of data management. Consideringthe current security offerings of Hadoop, companies areconcerned of building a single large cluster and onboardingmultiple projects on to the same common Hadoop cluster.Security vulnerability and privacy invasion due to maliciousattackers or inner users are the main argument points in anyHadoop implementation. In particular, various types of securityvulnerability occur due to the mode of data placement in HadoopCluster. When sensitive information is accessed by anunauthorized user or misused by an authorized person, they cancompromise privacy. In this paper, we intend to address theapproach of data placement across distributed DataNodes in asecure way by considering the sensitivity and security of theunderlying data. Our data placement strategy aims to adaptivelydistribute the data across the cluster using advanced machinelearning techniques to realize a more secured data/infrastructure.The data placement strategy discussed in this paper is highlyextensible and scalable to suit different sort of sensitivity/securityrequirements

    HadoopSec: Sensitivity-aware Secure Data Placement Strategy for Big Data/Hadoop Platform using Prescriptive Analytics

    Get PDF
    Hadoop has become one of the key player in offeringdata analytics and data processing support for any organizationthat handles different shades of data management. Consideringthe current security offerings of Hadoop, companies areconcerned of building a single large cluster and onboardingmultiple projects on to the same common Hadoop cluster.Security vulnerability and privacy invasion due to maliciousattackers or inner users are the main argument points in anyHadoop implementation. In particular, various types of securityvulnerability occur due to the mode of data placement in HadoopCluster. When sensitive information is accessed by anunauthorized user or misused by an authorized person, they cancompromise privacy. In this paper, we intend to address theapproach of data placement across distributed DataNodes in asecure way by considering the sensitivity and security of theunderlying data. Our data placement strategy aims to adaptivelydistribute the data across the cluster using advanced machinelearning techniques to realize a more secured data/infrastructure.The data placement strategy discussed in this paper is highlyextensible and scalable to suit different sort of sensitivity/securityrequirements

    Static Knowledge-Based Authentication Mechanism for Hadoop Distributed Platform using Kerberos

    Get PDF
    With the quickened phenomenal expansion of data, storing massive data has become important and increasingly growing day by day. Thus, big data came to express this large data and handling it properly under three important characteristics such as volume, veracity, and Variety. One practical of big data problems is user and services authentication. Kerberos v5 protocol provided a new solution to such this problem in the Hadoop-distributed platform (HDP). In this paper, we suggest a credible scheme by adding one more level of protection and authentication security to the Kerberos v5 protocol by using a static knowledge-based authentication (SKBA). Where in the login and verification phase by using Kerberos protocol, the KDC will replay with a question to the user-side to check the actual presence of user which the user already answered this question in his registration phase. Our credible scheme is useful in case of capturing messages that enable an eavesdropper to get the ticket that allows getting access to the HDFS as well as to avoid the common attacks with less computation, communication and storage cost. The proposed scheme works seriously and strictly to ensure the registration by delivery of user information over an insecure network in a safe manner and store this information in the KDC-database to be used later for getting access with HDFS

    Authentic and Anonymous Data Sharing with Data Partitioning in Big Data

    Get PDF
    A Hadoop is a framework for the transformation analysis of very huge data. This paper presents an distributed approach for data storage with the help of Hadoop distributed file system (HDFS). This scheme overcomes the drawbacks of other data storage scheme, as it stores the data in distributed format. So, there is no chance of any loss of data. HDFS stores the data in the form of replica’s, it is advantageous in case of failure of any node; user is able to easily recover from data loss unlike other storage system, if you loss then you cannot. We have proposed ID-Based Ring Signature Scheme to provide secure data sharing among the network, so that only authorized person have access to the data. System is became more attack prone with the help of Advanced Encryption Standard (AES). Even if attacker is successful in getting source data but it’s unable to decode it

    Blockchain based Access Control for Enterprise Blockchain Applications

    Get PDF
    Access control is one of the fundamental security mechanisms of IT systems. Most existing access control schemes rely on a centralized party to manage and enforce access control policies. As blockchain technologies, especially permissioned networks, find more applicability beyond cryptocurrencies in enterprise solutions, it is expected that the security requirements will increase. Therefore, it is necessary to develop an access control system that works in a decentralized environment without compromising the unique features of a blockchain. A straightforward method to support access control is to deploy a firewall in front of the enterprise blockchain application. However, this approach does not take advantage of the desirable features of blockchain. In order to address these concerns, we propose a novel blockchain‐based access control scheme, which keeps the decentralization feature for access control–related operations. The newly proposed system also provides the capability to protect user\u27s privacy by leveraging ring signature. We implement a prototype of the scheme using Hyperledger Fabric and assess its performance to show that it is practical for real‐world applications

    Secure Cloud Storage: A Framework for Data Protection as a Service in the Multi-cloud Environment

    Get PDF
    This paper introduces Secure Cloud Storage (SCS), a framework for Data Protection as a Service (DPaaS) to cloud computing users. Compared to the existing Data Encryption as a Service (DEaaS) such as those provided by Amazon and Google, DPaaS provides more flexibility to protect data in the cloud. In addition to supporting the basic data encryption capability as DEaaS does, DPaaS allows users to define fine-grained access control policies to protect their data. Once data is put under an access control policy, it is automatically encrypted and only if the policy is satisfied, the data could be decrypted and accessed by either the data owner or anyone else specified in the policy. The key idea of the SCS framework is to separate data management from security management in addition to defining a full cycle of data security automation from encryption to decryption. As a proof-of-concept for the design, we implemented a prototype of the SCS framework that works with both BT Cloud Compute platform and Amazon EC2. Experiments on the prototype have proved the efficiency of the SCS framework

    Using Hadoop to Support Big Data Analysis: Design and Performance Characteristics

    Get PDF
    Today, the amount of data generated is extremely large and is growing faster than computational speeds can keep up with. Therefore, using the traditional ways or we can say using a single machine to store or process data can no longer be beneficial and can take a huge amount of time. As a result, we need a different and better way to process data such as having data distributed over large computing clusters. Hadoop is a framework that allows the distributed processing of large data sets. Hadoop is an open source application available under the Apache License. It is designed to scale up from a single server to thousands of machines, where each machine can perform computations locally and store them. The literature indicates that processing Big Data in a reasonable time frame can be a challenging task. One of the most promising platforms is a concept of Exascale computing. This paper created a testbed based on recommendations for Big Data within the Exascale architecture. This testbed featured three nodes, Hadoop distributed file system. Data from Twitter logs was stored in both the Hadoop file system as well as a traditional MySQL database. The Hadoop file system consistently outperformed the MySQL database. The further research uses larger data sets and more complex queries to truly assess the capabilities of distributed file systems. This research also addresses optimizing the number of processing nodes and the intercommunication paths in the underlying infrastructure of the distributed file system. HIVE.apache.org states that the Apache HIVE data warehouse software facilitates reading, writing, and managing large datasets residing in distributes storage using SQL. At the end, there is an explanation of how to install and launch Hadoop and HIVE, how to configure the rules in a Hadoop ecosystem and the few use cases to check the performance

    Wiki-health: from quantified self to self-understanding

    Get PDF
    Today, healthcare providers are experiencing explosive growth in data, and medical imaging represents a significant portion of that data. Meanwhile, the pervasive use of mobile phones and the rising adoption of sensing devices, enabling people to collect data independently at any time or place is leading to a torrent of sensor data. The scale and richness of the sensor data currently being collected and analysed is rapidly growing. The key challenges that we will be facing are how to effectively manage and make use of this abundance of easily-generated and diverse health data. This thesis investigates the challenges posed by the explosive growth of available healthcare data and proposes a number of potential solutions to the problem. As a result, a big data service platform, named Wiki-Health, is presented to provide a unified solution for collecting, storing, tagging, retrieving, searching and analysing personal health sensor data. Additionally, it allows users to reuse and remix data, along with analysis results and analysis models, to make health-related knowledge discovery more available to individual users on a massive scale. To tackle the challenge of efficiently managing the high volume and diversity of big data, Wiki-Health introduces a hybrid data storage approach capable of storing structured, semi-structured and unstructured sensor data and sensor metadata separately. A multi-tier cloud storage system—CACSS has been developed and serves as a component for the Wiki-Health platform, allowing it to manage the storage of unstructured data and semi-structured data, such as medical imaging files. CACSS has enabled comprehensive features such as global data de-duplication, performance-awareness and data caching services. The design of such a hybrid approach allows Wiki-Health to potentially handle heterogeneous formats of sensor data. To evaluate the proposed approach, we have developed an ECG-based health monitoring service and a virtual sensing service on top of the Wiki-Health platform. The two services demonstrate the feasibility and potential of using the Wiki-Health framework to enable better utilisation and comprehension of the vast amounts of sensor data available from different sources, and both show significant potential for real-world applications.Open Acces

    New Security Definitions, Constructions and Applications of Proxy Re-Encryption

    Get PDF
    La externalización de la gestión de la información es una práctica cada vez más común, siendo la computación en la nube (en inglés, cloud computing) el paradigma más representativo. Sin embargo, este enfoque genera también preocupación con respecto a la seguridad y privacidad debido a la inherente pérdida del control sobre los datos. Las soluciones tradicionales, principalmente basadas en la aplicación de políticas y estrategias de control de acceso, solo reducen el problema a una cuestión de confianza, que puede romperse fácilmente por los proveedores de servicio, tanto de forma accidental como intencionada. Por lo tanto, proteger la información externalizada, y al mismo tiempo, reducir la confianza que es necesario establecer con los proveedores de servicio, se convierte en un objetivo inmediato. Las soluciones basadas en criptografía son un mecanismo crucial de cara a este fin. Esta tesis está dedicada al estudio de un criptosistema llamado recifrado delegado (en inglés, proxy re-encryption), que constituye una solución práctica a este problema, tanto desde el punto de vista funcional como de eficiencia. El recifrado delegado es un tipo de cifrado de clave pública que permite delegar en una entidad la capacidad de transformar textos cifrados de una clave pública a otra, sin que pueda obtener ninguna información sobre el mensaje subyacente. Desde un punto de vista funcional, el recifrado delegado puede verse como un medio de delegación segura de acceso a información cifrada, por lo que representa un candidato natural para construir mecanismos de control de acceso criptográficos. Aparte de esto, este tipo de cifrado es, en sí mismo, de gran interés teórico, ya que sus definiciones de seguridad deben balancear al mismo tiempo la seguridad de los textos cifrados con la posibilidad de transformarlos mediante el recifrado, lo que supone una estimulante dicotomía. Las contribuciones de esta tesis siguen un enfoque transversal, ya que van desde las propias definiciones de seguridad del recifrado delegado, hasta los detalles específicos de potenciales aplicaciones, pasando por construcciones concretas
    corecore