32 research outputs found

    HadoopSec: Sensitivity-aware Secure Data Placement Strategy for Big Data/Hadoop Platform using Prescriptive Analytics

    Get PDF
    Hadoop has become one of the key player in offeringdata analytics and data processing support for any organizationthat handles different shades of data management. Consideringthe current security offerings of Hadoop, companies areconcerned of building a single large cluster and onboardingmultiple projects on to the same common Hadoop cluster.Security vulnerability and privacy invasion due to maliciousattackers or inner users are the main argument points in anyHadoop implementation. In particular, various types of securityvulnerability occur due to the mode of data placement in HadoopCluster. When sensitive information is accessed by anunauthorized user or misused by an authorized person, they cancompromise privacy. In this paper, we intend to address theapproach of data placement across distributed DataNodes in asecure way by considering the sensitivity and security of theunderlying data. Our data placement strategy aims to adaptivelydistribute the data across the cluster using advanced machinelearning techniques to realize a more secured data/infrastructure.The data placement strategy discussed in this paper is highlyextensible and scalable to suit different sort of sensitivity/securityrequirements

    HadoopSec: Sensitivity-aware Secure Data Placement Strategy for Big Data/Hadoop Platform using Prescriptive Analytics

    Get PDF
    Hadoop has become one of the key player in offeringdata analytics and data processing support for any organizationthat handles different shades of data management. Consideringthe current security offerings of Hadoop, companies areconcerned of building a single large cluster and onboardingmultiple projects on to the same common Hadoop cluster.Security vulnerability and privacy invasion due to maliciousattackers or inner users are the main argument points in anyHadoop implementation. In particular, various types of securityvulnerability occur due to the mode of data placement in HadoopCluster. When sensitive information is accessed by anunauthorized user or misused by an authorized person, they cancompromise privacy. In this paper, we intend to address theapproach of data placement across distributed DataNodes in asecure way by considering the sensitivity and security of theunderlying data. Our data placement strategy aims to adaptivelydistribute the data across the cluster using advanced machinelearning techniques to realize a more secured data/infrastructure.The data placement strategy discussed in this paper is highlyextensible and scalable to suit different sort of sensitivity/securityrequirements

    Flexible, wide-area storage for distributed systems using semantic cues

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 81-87).There is a growing set of Internet-based services that are too big, or too important, to run at a single site. Examples include Web services for e-mail, video and image hosting, and social networking. Splitting such services over multiple sites can increase capacity, improve fault tolerance, and reduce network delays to clients. These services often need storage infrastructure to share data among the sites. This dissertation explores the use of a new file system (WheelFS) specifically designed to be the storage infrastructure for wide-area distributed services. WheelFS allows applications to adjust the semantics of their data via semantic cues, which provide application control over consistency, failure handling, and file and replica placement. This dissertation describes a particular set of semantic cues that reflect the specific challenges that storing data over the wide-area network entails: high-latency and low-bandwidth links, coupled with increased node and link failures, when compared to local-area networks. By augmenting a familiar POSIX interface with support for semantic cues, WheelFS provides a wide-area distributed storage system intended to help multi-site applications share data and gain fault tolerance, in the form of a distributed file system. Its design allows applications to adjust the tradeoff between prompt visibility of updates from other sites and the ability for sites to operate independently despite failures and long delays. WheelFS is implemented as a user-level file system and is deployed on PlanetLab and Emu-lab.(cont.) Six applications (an all-pairs-pings script, a distributed Web cache, an email service, large file distribution, distributed compilation, and protein sequence alignment software) demonstrate that WheelFS's file system interface simplifies construction of distributed applications by allowing reuse of existing software. These applications would perform poorly with the strict semantics implied by a traditional file system interface, but by providing cues to WheelFS they are able to achieve good performance. Measurements show that applications built on WheelFS deliver comparable performance to services such as CoralCDN and BitTorrent that use specialized wide-area storage systems.by Jeremy Andrew Stribling.Ph.D

    Efficient file distribution in a flexible,wide-area file system

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Cataloged from PDF version of thesis.Includes bibliographical references (p. 63-65).WheelFS is a wide-area distributed file system designed to help applications cope with the challenges of sharing data over the wide-area network. A wide range of applications can use WheelFS as a storage layer because applications can control various trade-offs in WheelFS, such as consistency versus availability, using semantic cues. One key feature that many applications require from any storage system is efficient file distribution. The storage system needs to be able to serve files quickly, even large or popular ones, and allow users and applications to quickly browse files. Wide-area links with high latency and low throughput make achieving these goals difficult for most distributed storage systems. This thesis explores using pre fetching, a traditional file system optimization technique, in wide-area file systems for more efficient file distribution. This thesis focuses on Tread, a pref etcher for WheelFS. Tread includes several types of pre fetching to improve the performance of reading files and directories in WheelFS: read-ahead pre fetching, whole file prefetching, directory prefetching and a prefetching optimization for WheelFS's built-in cooperative caching. To makes the best use of scarce wide-area resources, Tread adaptively rate-limits prefetching and gives applications control over what and how prefetching is done using WheelFS's semantic cues. Experiments show that Tread can reduce the time to read a 10MB file in WheelFS by 40% and the time to list a directory with 100 entries by more than 80%.(cont.) In addition, experiments on Planetlab show that using prefetching with cooperative caching to distribute a 10MB file to 270 clients reduces the average latency for each client to read the file by almost 45%.by Irene Y. Zhang.M.Eng

    A Data Distribution Service in a hierarchical SDN architecture: implementation and evaluation

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Software-defined networks (SDNs) have caused a paradigm shift in communication networks as they enable network programmability using either centralized or distributed controllers. With the development of the industry and society, new verticals have emerged, such as Industry 4.0, cooperative sensing and augmented reality. These verticals require network robustness and availability, which forces the use of distributed domains to improve network scalability and resilience. To this aim, this paper proposes a new solution to distribute SDN domains by using Data Distribution Services (DDS). The DDS allows the exchange of network information, synchronization among controllers and auto-discovery. Moreover, it increases the control plane robustness, an important characteristic in 5G networks (e.g., if a controller fails, its resources and devices can be managed by other controllers in a short amount of time as they already know this information). To verify the effectiveness of the DDS, we design a testbed by integrating the DDS in SDN controllers and deploying these controllers in different regions of Spain. The communication among the controllers was evaluated in terms of latency and overhead.Postprint (author's final draft

    OpenFlow deployment and concept analysis

    Get PDF
    Terms such as SDN and OpenFlow (OF) are often used in the research and development of data networks. This paper deals with the analysis of the current state of OpenFlow protocol deployment options as it is the only real representative protocol that enables the implementation of Software Defined Networking outside an academic world. There is introduced an insight into the current state of the OpenFlow specification development at various levels is introduced. The possible limitations associated with this concept in conjunction with the latest version (1.3) of the specification published by ONF are also presented. In the conclusion there presented a demonstrative security application addressing the lack of IPv6 support in real network devices since most of today's switches and controllers support only OF v1.0
    corecore