402 research outputs found

    Misusability Measure Based Sanitization of Big Data for Privacy Preserving MapReduce Programming

    Get PDF
    Leakage and misuse of sensitive data is a challenging problem to enterprises. It has become more serious problem with the advent of cloud and big data. The rationale behind this is the increase in outsourcing of data to public cloud and publishing data for wider visibility. Therefore Privacy Preserving Data Publishing (PPDP), Privacy Preserving Data Mining (PPDM) and Privacy Preserving Distributed Data Mining (PPDM) are crucial in the contemporary era. PPDP and PPDM can protect privacy at data and process levels respectively. Therefore, with big data privacy to data became indispensable due to the fact that data is stored and processed in semi-trusted environment. In this paper we proposed a comprehensive methodology for effective sanitization of data based on misusability measure for preserving privacy to get rid of data leakage and misuse. We followed a hybrid approach that caters to the needs of privacy preserving MapReduce programming. We proposed an algorithm known as Misusability Measure-Based Privacy serving Algorithm (MMPP) which considers level of misusability prior to choosing and application of appropriate sanitization on big data. Our empirical study with Amazon EC2 and EMR revealed that the proposed methodology is useful in realizing privacy preserving Map Reduce programming

    Cost-Effective Resource Provisioning for MapReduce in a Cloud

    Get PDF
    This paper presents a new MapReduce cloud service model, Cura, for provisioning cost-effective MapReduce services in a cloud. In contrast to existing MapReduce cloud services such as a generic compute cloud or a dedicated MapReduce cloud, Cura has a number of unique benefits. First, Cura is designed to provide a cost-effective solution to efficiently handle MapReduce production workloads that have a significant amount of interactive jobs. Second, unlike existing services that require customers to decide the resources to be used for the jobs, Cura leverages MapReduce profiling to automatically create the best cluster configuration for the jobs. While the existing models allow only a per-job resource optimization for the jobs, Cura implements a globally efficient resource allocation scheme that significantly reduces the resource usage cost in the cloud. Third, Cura leverages unique optimization opportunities when dealing with workloads that can withstand some slack. By effectively multiplexing the available cloud resources among the jobs based on the job requirements, Cura achieves significantly lower resource usage costs for the jobs. Cura's core resource management schemes include cost-aware resource provisioning, VM-aware scheduling and online virtual machine reconfiguration. Our experimental results using Facebook-like workload traces show that our techniques lead to more than 80 percent reduction in the cloud compute infrastructure cost with upto 65 percent reduction in job response times

    Homomorphic-Encrypted Volume Rendering

    Full text link
    Computationally demanding tasks are typically calculated in dedicated data centers, and real-time visualizations also follow this trend. Some rendering tasks, however, require the highest level of confidentiality so that no other party, besides the owner, can read or see the sensitive data. Here we present a direct volume rendering approach that performs volume rendering directly on encrypted volume data by using the homomorphic Paillier encryption algorithm. This approach ensures that the volume data and rendered image are uninterpretable to the rendering server. Our volume rendering pipeline introduces novel approaches for encrypted-data compositing, interpolation, and opacity modulation, as well as simple transfer function design, where each of these routines maintains the highest level of privacy. We present performance and memory overhead analysis that is associated with our privacy-preserving scheme. Our approach is open and secure by design, as opposed to secure through obscurity. Owners of the data only have to keep their secure key confidential to guarantee the privacy of their volume data and the rendered images. Our work is, to our knowledge, the first privacy-preserving remote volume-rendering approach that does not require that any server involved be trustworthy; even in cases when the server is compromised, no sensitive data will be leaked to a foreign party.Comment: Accepted for presentation at IEEE VIS 202

    Visualization of hyperspectral images on parallel and distributed platform: Apache Spark

    Get PDF
    The field of hyperspectral image storage and processing has undergone a remarkable evolution in recent years. The visualization of these images represents a challenge as the number of bands exceeds three bands, since direct visualization using the trivial system red, green and blue (RGB) or hue, saturation and lightness (HSL) is not feasible. One potential solution to resolve this problem is the reduction of the dimensionality of the image to three dimensions and thereafter assigning each dimension to a color. Conventional tools and algorithms have become incapable of producing results within a reasonable time. In this paper, we present a new distributed method of visualization of hyperspectral image based on the principal component analysis (PCA) and implemented in a distributed parallel environment (Apache Spark). The visualization of the big hyperspectral images with the proposed method is made in a smaller time and with the same performance as the classical method of visualization

    Big Data Analytics for Earth Sciences: the EarthServer approach

    Get PDF
    Big Data Analytics is an emerging field since massive storage and computing capabilities have been made available by advanced e-infrastructures. Earth and Environmental sciences are likely to benefit from Big Data Analytics techniques supporting the processing of the large number of Earth Observation datasets currently acquired and generated through observations and simulations. However, Earth Science data and applications present specificities in terms of relevance of the geospatial information, wide heterogeneity of data models and formats, and complexity of processing. Therefore, Big Earth Data Analytics requires specifically tailored techniques and tools. The EarthServer Big Earth Data Analytics engine offers a solution for coverage-type datasets, built around a high performance array database technology, and the adoption and enhancement of standards for service interaction (OGC WCS and WCPS). The EarthServer solution, led by the collection of requirements from scientific communities and international initiatives, provides a holistic approach that ranges from query languages and scalability up to mobile access and visualization. The result is demonstrated and validated through the development of lighthouse applications in the Marine, Geology, Atmospheric, Planetary and Cryospheric science domains

    A Unified Framework for Secure Search Over Encrypted Cloud Data

    Get PDF
    This paper presents a unified framework that supports different types of privacy-preserving search queries over encrypted cloud data. In the framework, users can perform any of the multi-keyword search, range search and k-nearest neighbor search operations in a privacy-preserving manner. All three types of queries are transformed into predicate-based search leveraging bucketization, locality sensitive hashing and homomorphic encryption techniques. The proposed framework is implemented using Hadoop MapReduce, and its efficiency and accuracy are evaluated using publicly available real data sets. The implementation results show that the proposed framework can effectively be used in moderate sized data sets and it is scalable for much larger data sets provided that the number of computers in the Hadoop cluster is increased. To the best of our knowledge, the proposed framework is the first privacy-preserving solution, in which three different types of search queries are effectively applied over encrypted data
    • …
    corecore