2,353 research outputs found

    Data Anonymization Using Map Reduce on Cloud based A Scalable Two-Phase Top-Down Specialization

    Get PDF
    A large number of cloud services require users to impart` private data like electronic health records for data analysis or Mining, bringing privacy concerns. Anonymizing information sets through generalization to fulfill certain security prerequisites, for example, k-anonymity is a broadly utilized classification of protection safeguarding procedures At present, the scale of information in numerous cloud applications increments immensely as per the Big Data pattern, in this manner making it a test for normally utilized programming instruments to catch, oversee, and process such substantial scale information inside a bearable slipped by time. As an issue, it is a test for existing anonymization methodologies to accomplish security protection on security touchy extensive scale information sets because of their inadequacy of adaptability. In this paper, we propose a versatile two-stage top-down specialization (TDS) methodology to anonymize huge scale information sets utilizing the Map reduce schema on cloud. Experimental evaluation results demonstrate that with our approach, the scalability and efficiency of TDS can be significantly improved over existing approaches

    Scalable TPTDS Data Anonymization over Cloud using MapReduce

    Get PDF
    With the rapid advancement of big data digital age, large amount data is collected, mined and published. Data publishing become day today routine activity. Cloud computing is best suitable model to support big data applications. Large number of cloud service need users to share microdata like electronic health records, data containing financial transactions so that they can analyze this data. But one of the major issues in moving toward cloud is privacy threats. Data anonymization techniques are widely used to combat with privacy concerns .Anonymizing data sets using generalization to achieve k-anonymity is one of the privacy preserving techniques. Currently, the scale of data in many cloud applications is increasing massively in accordance with the Big Data tendency, thereby making it a difficult for commonly used software tools to capture, handle, manage and process such large-scale datasets. As a result it is challenge for existing approaches for achieving anonymization for large scale data sets due to their inefficiency to support scalability. This paper presents two phase top down specialization approach to anonymize large scale datasets .This approach uses MapReduce framework on cloud, so that it will be highly scalable and efficient. Here we introduce the scheduling mechanism called Optimized Balanced Scheduling to apply the Anonymization. OBS means individual dataset have the separate sensitive field. Every data set consist of sensitive field and give priority for this sensitive field. Then apply Anonymization on this sensitive field only depending upon the scheduling. DOI: 10.17762/ijritcc2321-8169.15077

    Conceptual Design and Implementation of a Cloud Computing Platform Paradigm

    Get PDF
    In recent times, organizations all over the world have stopped expanding infrastructures and building competencies in IT for enhanced efficiencies. Rather, they focus on their primary lines of businesses and “simply” connect to an existing IT cloud in the neighborhood or on the internet for their IT demands. Cloud computing is a new paradigm of large-scale distributed computing that centralizes the data and computation on the virtual “super computer” with unprecedented storage and computing capabilities. This paper focuses on the design of a conceptual framework and implementation of a cloud computing platform. This study attempts to design a platform on which users can plug-in anytime from anywhere and utilize enormous computing resources at a relatively low cost. Alongside the design, the mathematical model structures that support the design of the framework are explicitly described. The study is of paramount importance because the new framework provides opportunity to avoid network congestions that degrade performance among other shortcomings being experienced in some implementation cases. Keywords: Cloud Computing, Framework, Platform, Paradig

    Extract, Transform, and Load data from Legacy Systems to Azure Cloud

    Get PDF
    Internship report presented as partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Knowledge Management and Business IntelligenceIn a world with continuously evolving technologies and hardened competitive markets, organisations need to continually be on guard to grasp cutting edge technology and tools that will help them to surpass any competition that arises. Modern data platforms that incorporate cloud technologies, support organisations to strive and get ahead of their competitors by providing solutions that help them capture and optimally use untapped data, and scalable storages to adapt to ever-growing data quantities. Also, adopt data processing and visualisation tools that help to improve the decision-making process. With many cloud providers available in the market, from small players to major technology corporations, this offers much flexibility to organisations to choose the best cloud technology that will align with their use cases and overall products and services strategy. This internship came up at the time when one of Accenture’s significant client in the financial industry decided to migrate from legacy systems to a cloud-based data infrastructure that is Microsoft Azure cloud. During this internship, development of the data lake, which is a core part of the MDP, was done to understand better the type of challenges that can be faced when migrating data from on-premise legacy systems to a cloud-based infrastructure. Also, provided in this work, are the main recommendations and guidelines when it comes to performing a large scale data migration

    A Protection Layer over MapReduce Framework for Big Data Privacy

    Get PDF
    In many organizations, big data analytics has become a trend in gathering valuable data insights. The framework MapReduce, which is generally used for this purpose, has been accepted by most organizations for its exceptional characteristics. However, because of the availability of significant processing resources, dispersed privacy-sensitive details can be collected quickly, increasing the widespread privacy concerns.  This article reviews some of the existing research articles on the MapReduce framework's privacy issues and proposes an additional layer of privacy protection over the adopted framework. The data is split into bits and processed in the clouds, and two other steps are taken. Hadoop splits the file into bits of a smaller scale. The task tracker then allocates these bits to several mappers. First, the data is split up into key-value pairs, and the intermediate data sets are generated.  The efficiency of the suggested approach may then be effectively interpreted. Overall, the proposed method provides improved scalability. The following figures compare execution time with relation to file size and the number of partitions. As privacy protection technique is used, the loss of data content can be appropriately handled.  It has been demonstrated that MRPL outperforms current methods in terms of CPU optimization, memory usage, and reduced information loss.  Research reveals that the suggested strategy creates significant advantages for Big Data by enhancing privacy and protection. MRPL can considerably solve the privacy issues in Big Data

    Scalable Reliable SD Erlang Design

    Get PDF
    This technical report presents the design of Scalable Distributed (SD) Erlang: a set of language-level changes that aims to enable Distributed Erlang to scale for server applications on commodity hardware with at most 100,000 cores. We cover a number of aspects, specifically anticipated architecture, anticipated failures, scalable data structures, and scalable computation. Other two components that guided us in the design of SD Erlang are design principles and typical Erlang applications. The design principles summarise the type of modifications we aim to allow Erlang scalability. Erlang exemplars help us to identify the main Erlang scalability issues and hypothetically validate the SD Erlang design
    • …
    corecore