810 research outputs found

    SEARS: Space Efficient And Reliable Storage System in the Cloud

    Full text link
    Today's cloud storage services must offer storage reliability and fast data retrieval for large amount of data without sacrificing storage cost. We present SEARS, a cloud-based storage system which integrates erasure coding and data deduplication to support efficient and reliable data storage with fast user response time. With proper association of data to storage server clusters, SEARS provides flexible mixing of different configurations, suitable for real-time and archival applications. Our prototype implementation of SEARS over Amazon EC2 shows that it outperforms existing storage systems in storage efficiency and file retrieval time. For 3 MB files, SEARS delivers retrieval time of 2.52.5 s compared to 77 s with existing systems.Comment: 4 pages, IEEE LCN 201

    Exploiting Blockchains to improve Data Upload and Storage in the Cloud

    Get PDF
    Cloud computing is an information technology that enables different users to access a shared pool of configurable system resources and different services without physically acquiring them. Most industries nowadays such as banking, healthcare and education are migrating to the cloud due to its efficiency of services especially when it comes to data security and integrity. Cloud platforms encounter numerous challenges such as Data deduplication, Data Transmission, Data Integrity, VM Security, Data Availability, Bandwidth usage… etc. In this paper we have adopted the Blockchain technology - which is a relatively new technology - that emerged for the first time as the cryptocurrency Bitcoin and proved its efficiency in securing data and assuring data integrity. It is mostly a distributed public ledger that holds transactions data in case of Bitcoin. In our work blockchains are adopted in a different way than its regular use in bitcoin. Three of the major challenges in Cloud Computing and Cloud services are Data Deduplication, Storage and Bandwidth usage are discussed in this paper

    A Survey on Data Deduplication

    Get PDF
    Now-a-days, the demand of data storage capacity is increasing drastically. Due to more demands of storage, the computer society is attracting toward cloud storage. Security of data and cost factors are important challenges in cloud storage. A duplicate file not only waste the storage, it also increases the access time. So the detection and removal of duplicate data is an essential task. Data deduplication, an efficient approach to data reduction, has gained increasing attention and popularity in large-scale storage systems. It eliminates redundant data at the file or subfile level and identifies duplicate content by its cryptographically secure hash signature. It is very tricky because neither duplicate files don?t have a common key nor they contain error. There are several approaches to identify and remove redundant data at file and chunk levels. In this paper, the background and key features of data deduplication is covered, then summarize and classify the data deduplication process according to the key workflow

    Influence of Wireless Novel Routing Protocol by Using MDPC Algorithm

    Get PDF
    This research study investigates the influence of a novel wireless routing protocol that incorporates the MDPC (Multiplicative-Divisive Probabilistic Congestion Control) algorithm. The background of the research stems from the increasing demand for efficient and reliable routing protocols in wireless networks, which face challenges such as limited bandwidth, variable network topologies, and dynamic environmental conditions. The purpose of this study is to evaluate the performance of the proposed routing protocol and assess its effectiveness in addressing these challenges. To achieve this objective, a series of methodologies were employed. First, an in-depth analysis of existing routing protocols was conducted to identify their limitations and areas for improvement. The simulations were conducted in controlled environments, and real-world deployment scenarios may introduce additional challenges that need to be addressed. Furthermore, practical implications of implementing the protocol, such as hardware and software compatibility, scalability, and security considerations, should be thoroughly investigated before widespread adoption

    Mining of Frequent Item with BSW Chunking

    Get PDF
    Apriori is an algorithm for finding the frequent patterns in transactional databases is considered as one of the most important data mining problems. Apriori algorithm is a masterpiece algorithm of association rule mining. This algorithm somehow has constraint and thus, giving the opportunity to do this research. Increased availability of the Multicore processors is forcing us to re-design algorithms and applications so as to accomplishment the computational power from multiple cores finding frequent item sets is more expensive in terms of computing resources utilization and CPU power. Thus superiority of parallel apriori algorithms effect on parallelizing the process of frequent item set find. The parallel frequent item sets mining algorithms gives the direction to solve the issue of distributing the candidates among processors. Efficient algorithm to discover frequent patterns is important in data mining research Lots of algorithms for mining association rules and their mutations are proposed on basis of Apriori algorithm, but traditional algorithms are not efficient. The objective of the Apriori Algorithm is to find associations between different sets of data. It is occasionally referred to as "Market Basket Analysis". Every several set of data has a number of items and is called a transaction. The achievement of Apriori is sets of rules that tell us how often items are contained in sets of data. In order to find more valuable rules, our basic aim is to implement apriori algorithm using multithreading approach which can utilization our system hardware power to improved algorithm is reasonable and effective, can extract more value information

    Data De-Duplication in NoSQL Databases

    Get PDF
    With the popularity and expansion of Cloud Computing, NoSQL databases (DBs) are becoming the preferred choice of storing data in the Cloud. Because they are highly de-normalized, these DBs tend to store significant amounts of redundant data. Data de-duplication (DD) has an important role in reducing storage consumption to make it affordable to manage in today’s explosive data growth. Numerous DD methodologies like chunking and, delta encoding are available today to optimize the use of storage. These technologies approach DD at file and/or sub-file level but this approach has never been optimal for NoSQL DBs. This research proposes data De-Duplication in NoSQL Databases (DDNSDB) which makes use of a DD approach at a higher level of abstraction, namely at the DB level. It makes use of the structural information about the data (metadata) exploiting its granularity to identify and remove duplicates. The main goals of this research are: to maximally reduce the amount of duplicates in one type of NoSQL DBs, namely the key-value store, to maximally increase the process performance such that the backup window is marginally affected, and to design with horizontal scaling in mind such that it would run on a Cloud Platform competitively. Additionally, this research presents an analysis of the various types of NoSQL DBs (such as key-value, tabular/columnar, and document DBs) to understand their data model required for the design and implementation of DDNSDB. Primary experiments have demonstrated that DDNSDB can further reduce the NoSQL DB storage space compared with current archiving methods (from 17% to near 69% as more structural information is available). Also, by following an optimized adapted MapReduce architecture, DDNSDB proves to have competitive performance advantage in a horizontal scaling cloud environment compared with a vertical scaling environment (from 28.8 milliseconds to 34.9 milliseconds as the number of parallel Virtual Machines grows)

    THE USE OF ROUGH CLASSIFICATION AND TWO THRESHOLD TWO DIVISORS FOR DEDUPLICATION

    Get PDF
    The data deduplication technique efficiently reduces and removes redundant data in big data storage systems. The main issue is that the data deduplication requires expensive computational effort to remove duplicate data due to the vast size of big data. The paper attempts to reduce the time and computation required for data deduplication stages. The chunking and hashing stage often requires a lot of calculations and time. This paper initially proposes an efficient new method to exploit the parallel processing of deduplication systems with the best performance. The proposed system is designed to use multicore computing efficiently. First, The proposed method removes redundant data by making a rough classification for the input into several classes using the histogram similarity and k-mean algorithm. Next, a new method for calculating the divisor list for each class was introduced to improve the chunking method and increase the data deduplication ratio. Finally, the performance of the proposed method was evaluated using three datasets as test examples. The proposed method proves that data deduplication based on classes and a multicore processor is much faster than a single-core processor. Moreover, the experimental results showed that the proposed method significantly improved the performance of Two Threshold Two Divisors (TTTD) and Basic Sliding Window BSW algorithms

    A survey and classification of storage deduplication systems

    Get PDF
    The automatic elimination of duplicate data in a storage system commonly known as deduplication is increasingly accepted as an effective technique to reduce storage costs. Thus, it has been applied to different storage types, including archives and backups, primary storage, within solid state disks, and even to random access memory. Although the general approach to deduplication is shared by all storage types, each poses specific challenges and leads to different trade-offs and solutions. This diversity is often misunderstood, thus underestimating the relevance of new research and development. The first contribution of this paper is a classification of deduplication systems according to six criteria that correspond to key design decisions: granularity, locality, timing, indexing, technique, and scope. This classification identifies and describes the different approaches used for each of them. As a second contribution, we describe which combinations of these design decisions have been proposed and found more useful for challenges in each storage type. Finally, outstanding research challenges and unexplored design points are identified and discussed.This work is funded by the European Regional Development Fund (EDRF) through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the Fundacao para a Ciencia e a Tecnologia (FCT; Portuguese Foundation for Science and Technology) within project RED FCOMP-01-0124-FEDER-010156 and the FCT by PhD scholarship SFRH-BD-71372-2010

    COMPRESSION OF WEARABLE BODY SENSOR NETWORK DATA USING IMPROVED TWO-THRESHOLD-TWO-DIVISOR DATA CHUNKING ALGORITHM

    Get PDF
    Compression plays a significant role in Body Sensor Networks (BSN) data since the sensors in BSNs have limited battery power and memory. Also, data needs to be transmitted fast and in a lossless manner to provide near real-time feedback. The paper evaluates lossless data compression algorithms like Run Length Encoding (RLE), Lempel Zev Welch (LZW) and Huffman on data from wearable devices and compares them in terms of Compression Ratio, Compression Factor, Savings Percentage and Compression Time. It also evaluates a data deduplication technique used for Low Bandwidth File Systems (LBFS) named Two Thresholds Two Divisors (TTTD) algorithm to determine if it could be used for BSN data. By changing the parameters and running the algorithm multiple times on the data, it arrives at a set of values that give \u3e50 compression ratio on BSN data. This is the first value of the paper. Based on these performance evaluation results of TTTD and various classical compression algorithms, it proposes a technique to combine multiple algorithms in sequence. Upon comparison of the performance, it has been found that the new algorithm, TTTD-H, which does TTTD and Huffman in sequence, improves the Savings Percentage by 23 percent over TTTD, and 31 percent over Huffman when executed independently. Compression Factor improved by 142 percent over TTTD, 52 percent over LZW, 178 percent over Huffman for a file of 3.5 MB. These significant results are the second important value of the project
    corecore