1,464 research outputs found

    Data Replication Strategies in Cloud Computing

    Get PDF
    Data replication is a widely used technique in various systems. For example, it can be employed in large-scale distributed file systems to increase data availability and system reliability, or it can be used in many network models (e.g. data grid, Amazon CloudFront) to reduce access latency and network bandwidth consumption, etc. I study a series of problems that related to the data replication method in Hadoop Distributed File System (HDFS) and in Amazon CloudFront service. Data failure, which is caused by hardware failure or malfunction, software error, human error, is the greatest threat to the file storage system. I present a set of schemes to enhance the efficiency of the current data replication strategy in HDFS thereby improving system reliability and performance. I also study the application replication placement problem based on an Original-Front sever model, and I propose a novel strategy which intends to maximize the profit of the application providers

    A cost-efficient QoS-aware analytical model of future software content delivery networks

    Get PDF
    Freelance, part-time, work-at-home, and other flexible jobs are changing the concept of workplace, and bringing information and content exchange problems to companies. Geographically spread corporations may use remote distribution of software and data to attend employees' demands, by exploiting emerging delivery technologies. In this context, cost-efficient software distribution is crucial to allow business evolution and make IT infrastructures more agile. On the other hand, container based virtualization technology is shaping the new trends of software deployment and infrastructure design. We envision current and future enterprise IT management trends evolving towards container based software delivery over Hybrid CDNs. This paper presents a novel cost-efficient QoS aware analytical model and a Hybrid CDN-P2P architecture for enterprise software distribution. The model would allow delivery cost minimization for a wide range of companies, from big multinationals to SMEs, using CDN-P2P distribution under various industrial hypothetical scenarios. Model constraints guarantee acceptable deployment times and keep interchanged content amounts below the bandwidth and storage network limits in our scenarios. Indeed, key model parameters account for network bandwidth, storage limits and rental prices, which are empirically determined from their offered values by the commercial delivery networks KeyCDN, MaxCDN, CDN77 and BunnyCDN. This preliminary study indicates that MaxCDN offers the best cost-QoS trade-off. The model is implemented in the network simulation tool PeerSim, and then applied to diverse testing scenarios by varying company types, number and profile (either, technical or administrative) of employees and the number and size of content requests. Hybrid simulation results show overall economic savings between 5\% and 20\%, compared to just hiring resources from a commercial CDN, while guaranteeing satisfactory QoS levels in terms of deployment times and number of served requests.This work was partially supported by Generalitat de Catalunya under the SGR Program (2017-SGR-962) and the RIS3CAT DRAC Project (001-P-001723). We have also received funding from Ministry of Science and Innovation (Spain) under the project EQC2019-005653-P.Peer ReviewedPostprint (author's final draft

    Document replication strategies for geographically distributed web search engines

    Get PDF
    Cataloged from PDF version of article.Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine. (C) 2012 Elsevier Ltd. All rights reserved

    Economy-based data replication broker

    Full text link
    Data replication is one of the key components in data grid architecture as it enhances data access and reliability and minimises the cost of data transmission. In this paper, we address the problem of reducing the overheads of the replication mechanisms that drive the data management components of a data grid. We propose an approach that extends the resource broker with policies that factor in user quality of service as well as service costs when replicating and transferring data. A realistic model of the data grid was created to simulate and explore the performance of the proposed policy. The policy displayed an effective means of improving the performance of the grid network traffic and is indicated by the improvement of speed and cost of transfers by brokers.<br /
    • …
    corecore