522 research outputs found

    Off-line Deduplication Method for Solid-State Disk Based on Hot and Cold Data

    Get PDF
    Solid-state disk (SSD) deduplication refers to the identification and deletion of duplicate data stored in an SSD. The reliability of SSDs is improved by deduplication. At present, the common data deduplication of SSDs is based on online data deduplication with Field Programmable Gate Array (FPGA) acceleration. The disadvantage is that FPGA, which has a complex structure. An off-line deduplication method for the SSD based on hot and cold data was proposed in this study to simplify the structure of an SSD deduplication, reduce the cost, and improve the efficiency of deduplication and access performance of SSDs. First, the wear-leveling algorithm was employed in the SSD to divide the data into cold and hot. Then, the corresponding fingerprint was generated for the cold data. Second, the fingerprint was compared, and the cold data with the same fingerprint were deleted. Finally, the cold and hot data were exchanged after deduplication. Results demonstrate that the duplicate recognition rate of the proposed method is 5% - 38%, which is close to that of the online deduplication method. In terms of access performance, the performance of SSDs using the proposed method is improved by 20% compared with that of traditional SSDs and is near the access performance of SSDs using online deduplication. This study provides certain reference for improving the reliability of existing SSDs

    A comprehensive meta-analysis of cryptographic security mechanisms for cloud computing

    Get PDF
    The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.The concept of cloud computing offers measurable computational or information resources as a service over the Internet. The major motivation behind the cloud setup is economic benefits, because it assures the reduction in expenditure for operational and infrastructural purposes. To transform it into a reality there are some impediments and hurdles which are required to be tackled, most profound of which are security, privacy and reliability issues. As the user data is revealed to the cloud, it departs the protection-sphere of the data owner. However, this brings partly new security and privacy concerns. This work focuses on these issues related to various cloud services and deployment models by spotlighting their major challenges. While the classical cryptography is an ancient discipline, modern cryptography, which has been mostly developed in the last few decades, is the subject of study which needs to be implemented so as to ensure strong security and privacy mechanisms in today’s real-world scenarios. The technological solutions, short and long term research goals of the cloud security will be described and addressed using various classical cryptographic mechanisms as well as modern ones. This work explores the new directions in cloud computing security, while highlighting the correct selection of these fundamental technologies from cryptographic point of view

    A software approach to defeating side channels in last-level caches

    Full text link
    We present a software approach to mitigate access-driven side-channel attacks that leverage last-level caches (LLCs) shared across cores to leak information between security domains (e.g., tenants in a cloud). Our approach dynamically manages physical memory pages shared between security domains to disable sharing of LLC lines, thus preventing "Flush-Reload" side channels via LLCs. It also manages cacheability of memory pages to thwart cross-tenant "Prime-Probe" attacks in LLCs. We have implemented our approach as a memory management subsystem called CacheBar within the Linux kernel to intervene on such side channels across container boundaries, as containers are a common method for enforcing tenant isolation in Platform-as-a-Service (PaaS) clouds. Through formal verification, principled analysis, and empirical evaluation, we show that CacheBar achieves strong security with small performance overheads for PaaS workloads

    Cloud Cost Optimization: A Comprehensive Review of Strategies and Case Studies

    Full text link
    Cloud computing has revolutionized the way organizations manage their IT infrastructure, but it has also introduced new challenges, such as managing cloud costs. This paper explores various techniques for cloud cost optimization, including cloud pricing, analysis, and strategies for resource allocation. Real-world case studies of these techniques are presented, along with a discussion of their effectiveness and key takeaways. The analysis conducted in this paper reveals that organizations can achieve significant cost savings by adopting cloud cost optimization techniques. Additionally, future research directions are proposed to advance the state of the art in this important field

    RecD: Deduplication for End-to-End Deep Learning Recommendation Model Training Infrastructure

    Full text link
    We present RecD (Recommendation Deduplication), a suite of end-to-end infrastructure optimizations across the Deep Learning Recommendation Model (DLRM) training pipeline. RecD addresses immense storage, preprocessing, and training overheads caused by feature duplication inherent in industry-scale DLRM training datasets. Feature duplication arises because DLRM datasets are generated from interactions. While each user session can generate multiple training samples, many features' values do not change across these samples. We demonstrate how RecD exploits this property, end-to-end, across a deployed training pipeline. RecD optimizes data generation pipelines to decrease dataset storage and preprocessing resource demands and to maximize duplication within a training batch. RecD introduces a new tensor format, InverseKeyedJaggedTensors (IKJTs), to deduplicate feature values in each batch. We show how DLRM model architectures can leverage IKJTs to drastically increase training throughput. RecD improves the training and preprocessing throughput and storage efficiency by up to 2.48x, 1.79x, and 3.71x, respectively, in an industry-scale DLRM training system.Comment: Published in the Proceedings of the Sixth Conference on Machine Learning and Systems (MLSys 2023

    An extensive research survey on data integrity and deduplication towards privacy in cloud storage

    Get PDF
    Owing to the highly distributed nature of the cloud storage system, it is one of the challenging tasks to incorporate a higher degree of security towards the vulnerable data. Apart from various security concerns, data privacy is still one of the unsolved problems in this regards. The prime reason is that existing approaches of data privacy doesn't offer data integrity and secure data deduplication process at the same time, which is highly essential to ensure a higher degree of resistance against all form of dynamic threats over cloud and internet systems. Therefore, data integrity, as well as data deduplication is such associated phenomena which influence data privacy. Therefore, this manuscript discusses the explicit research contribution toward data integrity, data privacy, and data deduplication. The manuscript also contributes towards highlighting the potential open research issues followed by a discussion of the possible future direction of work towards addressing the existing problems

    Service Abstractions for Scalable Deep Learning Inference at the Edge

    Get PDF
    Deep learning driven intelligent edge has already become a reality, where millions of mobile, wearable, and IoT devices analyze real-time data and transform those into actionable insights on-device. Typical approaches for optimizing deep learning inference mostly focus on accelerating the execution of individual inference tasks, without considering the contextual correlation unique to edge environments and the statistical nature of learning-based computation. Specifically, they treat inference workloads as individual black boxes and apply canonical system optimization techniques, developed over the last few decades, to handle them as yet another type of computation-intensive applications. As a result, deep learning inference on edge devices still face the ever increasing challenges of customization to edge device heterogeneity, fuzzy computation redundancy between inference tasks, and end-to-end deployment at scale. In this thesis, we propose the first framework that automates and scales the end-to-end process of deploying efficient deep learning inference from the cloud to heterogeneous edge devices. The framework consists of a series of service abstractions that handle DNN model tailoring, model indexing and query, and computation reuse for runtime inference respectively. Together, these services bridge the gap between deep learning training and inference, eliminate computation redundancy during inference execution, and further lower the barrier for deep learning algorithm and system co-optimization. To build efficient and scalable services, we take a unique algorithmic approach of harnessing the semantic correlation between the learning-based computation. Rather than viewing individual tasks as isolated black boxes, we optimize them collectively in a white box approach, proposing primitives to formulate the semantics of the deep learning workloads, algorithms to assess their hidden correlation (in terms of the input data, the neural network models, and the deployment trials) and merge common processing steps to minimize redundancy
    • …
    corecore