119 research outputs found

    Towards Data Optimization in Storages and Networks

    Get PDF
    Title from PDF of title page, viewed on August 7, 2015Dissertation advisors: Sejun Song and Baek-Young ChoiVitaIncludes bibliographic references (pages 132-140)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2015We are encountering an explosion of data volume, as a study estimates that data will amount to 40 zeta bytes by the end of 2020. This data explosion poses significant burden not only on data storage space but also access latency, manageability, and processing and network bandwidth. However, large portions of the huge data volume contain massive redundancies that are created by users, applications, systems, and communication models. Deduplication is a technique to reduce data volume by removing redundancies. Reliability will be even improved when data is replicated after deduplication. Many deduplication studies such as storage data deduplication and network redundancy elimination have been proposed to reduce storage consumption and network bandwidth consumption. However, existing solutions are not efficient enough to optimize data delivery path from clients to servers through network. Hence we propose a holistic deduplication framework to optimize data in their path. Our deduplication framework consists of three components including data sources or clients, networks, and servers. The client component removes local redundancies in clients, the network component removes redundant transfers coming from different clients, and the server component removes redundancies coming from different networks. We designed and developed components for the proposed deduplication framework. For the server component, we developed the Hybrid Email Deduplication System that achieves a trade-off of space savings and overhead for email systems. For the client component, we developed the Structure Aware File and Email Deduplication for Cloudbased Storage Systems that is very fast as well as having good space savings by using structure-based granularity. For the network component, we developed a system called Software-defined Deduplication as a Network and Storage service that is in-network deduplication, and that chains storage data deduplication and network redundancy elimination functions by using Software Defined Network to achieve both storage space and network bandwidth savings with low processing time and memory size. We also discuss mobile deduplication for image and video files in mobile devices. Through system implementations and experiments, we show that the proposed framework effectively and efficiently optimizes data volume in a holistic manner encompassing the entire data path of clients, networks and storage servers.Introduction -- Deduplication technology -- Existing deduplication approaches -- HEDS: Hybrid Email Deduplication System -- SAFE: Structure-aware File and Email Deduplication for cloud-based storage systems -- SoftDance: Software-defined Deduplication as a Network and Storage Service -- Moblie de-duplication -- Conclusion

    Service Abstractions for Scalable Deep Learning Inference at the Edge

    Get PDF
    Deep learning driven intelligent edge has already become a reality, where millions of mobile, wearable, and IoT devices analyze real-time data and transform those into actionable insights on-device. Typical approaches for optimizing deep learning inference mostly focus on accelerating the execution of individual inference tasks, without considering the contextual correlation unique to edge environments and the statistical nature of learning-based computation. Specifically, they treat inference workloads as individual black boxes and apply canonical system optimization techniques, developed over the last few decades, to handle them as yet another type of computation-intensive applications. As a result, deep learning inference on edge devices still face the ever increasing challenges of customization to edge device heterogeneity, fuzzy computation redundancy between inference tasks, and end-to-end deployment at scale. In this thesis, we propose the first framework that automates and scales the end-to-end process of deploying efficient deep learning inference from the cloud to heterogeneous edge devices. The framework consists of a series of service abstractions that handle DNN model tailoring, model indexing and query, and computation reuse for runtime inference respectively. Together, these services bridge the gap between deep learning training and inference, eliminate computation redundancy during inference execution, and further lower the barrier for deep learning algorithm and system co-optimization. To build efficient and scalable services, we take a unique algorithmic approach of harnessing the semantic correlation between the learning-based computation. Rather than viewing individual tasks as isolated black boxes, we optimize them collectively in a white box approach, proposing primitives to formulate the semantics of the deep learning workloads, algorithms to assess their hidden correlation (in terms of the input data, the neural network models, and the deployment trials) and merge common processing steps to minimize redundancy

    An Analysis of C/C++ Datasets for Machine Learning-Assisted Software Vulnerability Detection

    Get PDF
    As machine learning-assisted vulnerability detection research matures, it is critical to understand the datasets being used by existing papers. In this paper, we explore 7 C/C++ datasets and evaluate their suitability for machine learning-assisted vulnerability detection. We also present a new dataset, named Wild C, containing over 10.3 million individual opensource C/C++ files – a sufficiently large sample to be reasonably considered representative of typical C/C++ code. To facilitate comparison, we tokenize all of the datasets and perform the analysis at this level. We make three primary contributions. First, while all the datasets differ from our Wild C dataset, some do so to a greater degree. This includes divergence in file lengths and token usage frequency. Additionally, none of the datasets contain the entirety of the C/C++ vocabulary. These missing tokens account for up to 11% of all token usage. Second, we find all the datasets contain duplication with some containing a significant amount. In the Juliet dataset, we describe augmentations of test cases making the dataset susceptible to data leakage. This augmentation occurs with such frequency that a random 80/20 split has roughly 58% overlap of the test with the training data. Finally, we collect and process a large dataset of C code, named Wild C. This dataset is designed to serve as a representative sample of all C/C++ code and is the basis for our analyses

    SECURING THE DATA STORAGE AND PROCESSING IN CLOUD COMPUTING ENVIRONMENT

    Get PDF
    Organizations increasingly utilize cloud computing architectures to reduce costs and en- ergy consumption both in the data warehouse and on mobile devices by better utilizing the computing resources available. However, the security and privacy issues with publicly available cloud computing infrastructures have not been studied to a sufficient depth for or- ganizations and individuals to be fully informed of the risks; neither are private nor public clouds prepared to properly secure their connections as middle-men between mobile de- vices which use encryption and external data providers which neglect to encrypt their data. Furthermore, cloud computing providers are not well informed of the risks associated with policy and techniques they could implement to mitigate those risks. In this dissertation, we present a new layered understanding of public cloud comput- ing. On the high level, we concentrate on the overall architecture and how information is processed and transmitted. The key idea is to secure information from outside attack and monitoring. We use techniques such as separating virtual machine roles, re-spawning virtual machines in high succession, and cryptography-based access control to achieve a high-level assurance of public cloud computing security and privacy. On the low level, we explore security and privacy issues on the memory management level. We present a mechanism for the prevention of automatic virtual machine memory guessing attacks

    A Comprehensive Survey on the Cooperation of Fog Computing Paradigm-Based IoT Applications: Layered Architecture, Real-Time Security Issues, and Solutions

    Get PDF
    The Internet of Things (IoT) can enable seamless communication between millions of billions of objects. As IoT applications continue to grow, they face several challenges, including high latency, limited processing and storage capacity, and network failures. To address these stated challenges, the fog computing paradigm has been introduced, purpose is to integrate the cloud computing paradigm with IoT to bring the cloud resources closer to the IoT devices. Thus, it extends the computing, storage, and networking facilities toward the edge of the network. However, data processing and storage occur at the IoT devices themselves in the fog-based IoT network, eliminating the need to transmit the data to the cloud. Further, it also provides a faster response as compared to the cloud. Unfortunately, the characteristics of fog-based IoT networks arise traditional real-time security challenges, which may increase severe concern to the end-users. However, this paper aims to focus on fog-based IoT communication, targeting real-time security challenges. In this paper, we examine the layered architecture of fog-based IoT networks along working of IoT applications operating within the context of the fog computing paradigm. Moreover, we highlight real-time security challenges and explore several existing solutions proposed to tackle these challenges. In the end, we investigate the research challenges that need to be addressed and explore potential future research directions that should be followed by the research community.©2023 The Authors. Published by IEEE. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/fi=vertaisarvioitu|en=peerReviewed

    RedEdge: A Novel Architecture for Big Data Processing in Mobile Edge Computing Environments

    Get PDF
    We are witnessing the emergence of new big data processing architectures due to the convergence of the Internet of Things (IoTs), edge computing and cloud computing. Existing big data processing architectures are underpinned by the transfer of raw data streams to the cloud computing environment for processing and analysis. This operation is expensive and fails to meet the real-time processing needs of IoT applications. In this article, we present and evaluate a novel big data processing architecture named RedEdge (i.e., data reduction on the edge) that incorporates mechanism to facilitate the processing of big data streams near the source of the data. The RedEdge model leverages mobile IoT-termed mobile edge devices as primary data processing platforms. However, in the case of the unavailability of computational and battery power resources, it offloads data streams in nearer mobile edge devices or to the cloud. We evaluate the RedEdge architecture and the related mechanism within a real-world experiment setting involving 12 mobile users. The experimental evaluation reveals that the RedEdge model has the capability to reduce big data stream by up to 92.86% without compromising energy and memory consumption on mobile edge devices

    Cloud technology options towards Free Flow of Data

    Get PDF
    This whitepaper collects the technology solutions that the projects in the Data Protection, Security and Privacy Cluster propose to address the challenges raised by the working areas of the Free Flow of Data initiative. The document describes the technologies, methodologies, models, and tools researched and developed by the clustered projects mapped to the ten areas of work of the Free Flow of Data initiative. The aim is to facilitate the identification of the state-of-the-art of technology options towards solving the data security and privacy challenges posed by the Free Flow of Data initiative in Europe. The document gives reference to the Cluster, the individual projects and the technologies produced by them

    CoSMiC: A hierarchical cloudlet-based storage architecture for mobile clouds

    Get PDF
    Storage capacity is a constraint for current mobile devices. Mobile Cloud Computing (MCC) is developed to augment device capabilities, facilitating to mobile users store/access of a large dataset on the cloud through wireless networks. However, given the limitations of network bandwidth, latencies, and devices battery life, new solutions are needed to extend the usage of mobile devices. This paper presents a novel design and implementation of a hierarchical cloud storage system for mobile devices based on multiple I/O caching layers. The solution relies on Memcached as a cache system, preserving its powerful capacities such as performance, scalability, and quick and portable deployment. The solution targets to reduce the I/O latency of current mobile cloud solutions. The proposed solution consists of a user-level library and extended Memcached back-ends. The solution aims to be hierarchical by deploying Memcached-based I/O cache servers across all the I/O infrastructure datapath. Our experimental results demonstrate that CoSMiC can significantly reduce the round-trip latency in presence of low cache hit ratios compared with a 3G connection even when using a multi-level cache hierarchy
    • …
    corecore