1,347 research outputs found
Data Deduplication Technology for Cloud Storage
With the explosive growth of information data, the data storage system has stepped into the cloud storage era. Although the core of the cloud storage system is distributed file system in solving the problem of mass data storage, a large number of duplicate data exist in all storage system. File systems are designed to control how files are stored and retrieved. Fewer studies focus on the cloud file system deduplication technologies at the application level, especially for the Hadoop distributed file system. In this paper, we design a file deduplication framework on Hadoop distributed file system for cloud application developer. Proposed RFD-HDFS and FD-HDFS two data deduplication solutions process data deduplication online, which improves storage space utilisation and reduces the redundancy. In the end of the paper, we test the disk utilisation and the file upload performance on RFD-HDFS and FD-HDFS, and compare HDFS with the disk utilisation of two system frameworks. The results show that the two-system framework not only implements data deduplication function but also effectively reduces the disk utilisation of duplicate files. So, the proposed framework can indeed reduce the storage space by eliminating redundant HDFS file
SEARS: Space Efficient And Reliable Storage System in the Cloud
Today's cloud storage services must offer storage reliability and fast data
retrieval for large amount of data without sacrificing storage cost. We present
SEARS, a cloud-based storage system which integrates erasure coding and data
deduplication to support efficient and reliable data storage with fast user
response time. With proper association of data to storage server clusters,
SEARS provides flexible mixing of different configurations, suitable for
real-time and archival applications.
Our prototype implementation of SEARS over Amazon EC2 shows that it
outperforms existing storage systems in storage efficiency and file retrieval
time. For 3 MB files, SEARS delivers retrieval time of s compared to
s with existing systems.Comment: 4 pages, IEEE LCN 201
Cross-Layer Fragment Indexing based File Deduplication using Hyper Spectral Hash Duplicate Filter (HSHDF) for Optimized Cloud Storage
Cloud computing and storage processing is a big service for maintaining a large number of data in a centralized server to store and retrieve data depending on the use to pay as a service model. Due to increasing storage depending on duplicate copy presence during different sceneries, the increased size leads to increased cost. To resolve this problem, we propose a Cross-Layer Fragment Indexing (CLFI) based file deduplication using Hyper Spectral Hash Duplicate Filter (HSHDF) for optimized cloud storage. Initially, the file storage indexing easy carried out with Lexical Syntactic Parser (LSP) to split the files into blocks. Then comparativesector was created based on Chunk staking. Based on the file frequency weight, the relative Indexing was verified through Cross-Layer Fragment Indexing (CLFI). Then the fragmented index gets grouped by maximum relative threshold margin usingIntra Subset Near-Duplicate Clusters (ISNDC). The hashing is applied to get comparative index points based on hyper correlation comparer using Hyper Spectral Hash Duplicate Filter (HSHDF). This filter the near duplicate contentdepending on file content difference to identify the duplicates. This proposed system produces high performance compared to the other system. This optimizes cloudstorage and has a higher precision rate than other methods
An Analysis of Storage Virtualization
Investigating technologies and writing expansive documentation on their capabilities is like hitting a moving target. Technology is evolving, growing, and expanding what it can do each and every day. This makes it very difficult when trying to snap a line and investigate competing technologies. Storage virtualization is one of those moving targets. Large corporations develop software and hardware solutions that try to one up the competition by releasing firmware and patch updates to include their latest developments. Some of their latest innovations include differing RAID levels, virtualized storage, data compression, data deduplication, file deduplication, thin provisioning, new file system types, tiered storage, solid state disk, and software updates to coincide these technologies with their applicable hardware. Even data center environmental considerations like reusable energies, data center environmental characteristics, and geographic locations are being used by companies both small and large to reduce operating costs and limit environmental impacts. Companies are even moving to an entire cloud based setup to limit their environmental impact as it could be cost prohibited to maintain your own corporate infrastructure. The trifecta of integrating smart storage architectures to include storage virtualization technologies, reducing footprint to promote energy savings, and migrating to cloud based services will ensure a long-term sustainable storage subsystem
Towards Data Optimization in Storages and Networks
Title from PDF of title page, viewed on August 7, 2015Dissertation advisors: Sejun Song and Baek-Young ChoiVitaIncludes bibliographic references (pages 132-140)Thesis (Ph.D.)--School of Computing and Engineering. University of Missouri--Kansas City, 2015We are encountering an explosion of data volume, as a study estimates that data
will amount to 40 zeta bytes by the end of 2020. This data explosion poses significant
burden not only on data storage space but also access latency, manageability, and processing
and network bandwidth. However, large portions of the huge data volume contain
massive redundancies that are created by users, applications, systems, and communication
models. Deduplication is a technique to reduce data volume by removing redundancies.
Reliability will be even improved when data is replicated after deduplication.
Many deduplication studies such as storage data deduplication and network redundancy
elimination have been proposed to reduce storage consumption and network
bandwidth consumption. However, existing solutions are not efficient enough to optimize
data delivery path from clients to servers through network. Hence we propose a holistic
deduplication framework to optimize data in their path. Our deduplication framework
consists of three components including data sources or clients, networks, and servers. The
client component removes local redundancies in clients, the network component removes
redundant transfers coming from different clients, and the server component removes redundancies
coming from different networks.
We designed and developed components for the proposed deduplication framework.
For the server component, we developed the Hybrid Email Deduplication System
that achieves a trade-off of space savings and overhead for email systems. For the client
component, we developed the Structure Aware File and Email Deduplication for Cloudbased
Storage Systems that is very fast as well as having good space savings by using
structure-based granularity. For the network component, we developed a system called
Software-defined Deduplication as a Network and Storage service that is in-network deduplication,
and that chains storage data deduplication and network redundancy elimination
functions by using Software Defined Network to achieve both storage space and network
bandwidth savings with low processing time and memory size. We also discuss mobile
deduplication for image and video files in mobile devices. Through system implementations
and experiments, we show that the proposed framework effectively and efficiently
optimizes data volume in a holistic manner encompassing the entire data path of clients,
networks and storage servers.Introduction -- Deduplication technology -- Existing deduplication approaches -- HEDS: Hybrid Email Deduplication System -- SAFE: Structure-aware File and Email Deduplication for cloud-based storage systems -- SoftDance: Software-defined Deduplication as a Network and Storage Service -- Moblie de-duplication -- Conclusion
- …