8,041 research outputs found

    A Distributed Algorithm for Web Content Replication

    Full text link
    Abstract—Web caching and replication techniques increase accessibility of Web contents and reduce Internet bandwidth requirements. In this paper, we are considering the replica placement problem in a distributed replication group. The replication group consists of servers dedicating certain amount of memory for replicating objects. The replica placement problem is to place the replica at the servers within the replication group such that the access time over all objects and servers is minimized. We design a distributed 2-approximation algorithm that solves this optimization problem. We show that the communication and computational complexity of the algorithm is polynomial in the number of servers and objects. We perform simulation experiments to investigate the performance of our algorithm. I

    Proactive caching placement for arbitrary topology with multi-hop forwarding in ICN

    Get PDF
    With the rapid growth of network traffic and the enhancement of the quality of experiences of users, Information-Centric Networking (ICN), which is a content-centric network architecture with named data caching and routing, is proposed to improve the multimedia content distribution efficiency. In arbitrary topology, cache nodes and users are randomly distributed and connected, hence it is challenging to achieve an optimal caching placement under this situation. In this paper, we propose a caching placement algorithm for arbitrary topology in ICN. We formulate an optimization problem of proactive caching placement for arbitrary topology combined with multi-hop forwarding, with an objective to optimize the user delay and the load balancing level of the nodes simultaneously. Since the original problem is NP-hard, we solve the formulated caching placement problem in two sub-problems, content replica allocation subproblem and content replica placement sub-problem. First, in the content replica allocation sub-problem, the replica number of each content is obtained by utilizing the auction theory. Second, the replica number of each content is used as a constraint for the content replica placement sub-problem, which is solved by matching theory. The caching placement algorithm combined with multi-hop NRR forwarding maximizes the utilization of cache resources in order to achieve better caching performance. The numerical results show that significant hop count savings and load balancing level improvement are attainable via the proposed algorithm

    DATA REPLICATION IN DISTRIBUTED SYSTEMS USING OLYMPIAD OPTIMIZATION ALGORITHM

    Get PDF
    Achieving timely access to data objects is a major challenge in big distributed systems like the Internet of Things (IoT) platforms. Therefore, minimizing the data read and write operation time in distributed systems has elevated to a higher priority for system designers and mechanical engineers. Replication and the appropriate placement of the replicas on the most accessible data servers is a problem of NP-complete optimization. The key objectives of the current study are minimizing the data access time, reducing the quantity of replicas, and improving the data availability. The current paper employs the Olympiad Optimization Algorithm (OOA) as a novel population-based and discrete heuristic algorithm to solve the replica placement problem which is also applicable to other fields such as mechanical and computer engineering design problems. This discrete algorithm was inspired by the learning process of student groups who are preparing for the Olympiad exams. The proposed algorithm, which is divide-and-conquer-based with local and global search strategies, was used in solving the replica placement problem in a standard simulated distributed system. The 'European Union Database' (EUData) was employed to evaluate the proposed algorithm, which contains 28 nodes as servers and a network architecture in the format of a complete graph. It was revealed that the proposed technique reduces data access time by 39% with around six replicas, which is vastly superior to the earlier methods. Moreover, the standard deviation of the results of the algorithm's different executions is approximately 0.0062, which is lower than the other techniques' standard deviation within the same experiments

    Unifying data and replica placement for data-intensive services in geographically distributed clouds

    Get PDF
    The increased reliance of data management applications on cloud computing technologies has rendered research in identifying solutions to the data placement problem to be of paramount importance. The objective of the classical data placement problem is to optimally partition, while also allowing for replication, the set of data-items into distributed data centers to minimize the overall network communication cost. Despite significant advancement in data placement research, replica placement has seldom been studied in unison with data placement. More specifically, most of the existing solutions employ a two-phase approach: 1) data placement, followed by 2) replication. Replication should however be seen as an integral part of data placement, and should be studied as a joint optimization problem with the latter. In this paper, we propose a unified paradigm of data placement, called CPR, which combines data placement and replication of data-intensive services into geographically distributed clouds as a joint optimization problem. Underneath CPR, lies an overlapping correlation clustering algorithm capable of assigning a data-item to multiple data centers, thereby enabling us to jointly solve data placement and replication. Experiments on a real-world trace-based online social network dataset show that CPR is effective and scalable. Empirically, it is approximate to 35% better in efficacy on the evaluated metrics, while being up to 8 times faster in execution time when compared to state-of-the-art techniques

    Improvement of Data-Intensive Applications Running on Cloud Computing Clusters

    Get PDF
    MapReduce, designed by Google, is widely used as the most popular distributed programming model in cloud environments. Hadoop, an open-source implementation of MapReduce, is a data management framework on large cluster of commodity machines to handle data-intensive applications. Many famous enterprises including Facebook, Twitter, and Adobe have been using Hadoop for their data-intensive processing needs. Task stragglers in MapReduce jobs dramatically impede job execution on massive datasets in cloud computing systems. This impedance is due to the uneven distribution of input data and computation load among cluster nodes, heterogeneous data nodes, data skew in reduce phase, resource contention situations, and network configurations. All these reasons may cause delay failure and the violation of job completion time. One of the key issues that can significantly affect the performance of cloud computing is the computation load balancing among cluster nodes. Replica placement in Hadoop distributed file system plays a significant role in data availability and the balanced utilization of clusters. In the current replica placement policy (RPP) of Hadoop distributed file system (HDFS), the replicas of data blocks cannot be evenly distributed across cluster\u27s nodes. The current HDFS must rely on a load balancing utility for balancing the distribution of replicas, which results in extra overhead for time and resources. This dissertation addresses data load balancing problem and presents an innovative replica placement policy for HDFS. It can perfectly balance the data load among cluster\u27s nodes. The heterogeneity of cluster nodes exacerbates the issue of computational load balancing; therefore, another replica placement algorithm has been proposed in this dissertation for heterogeneous cluster environments. The timing of identifying the straggler map task is very important for straggler mitigation in data-intensive cloud computing. To mitigate the straggler map task, Present progress and Feedback based Speculative Execution (PFSE) algorithm has been proposed in this dissertation. PFSE is a new straggler identification scheme to identify the straggler map tasks based on the feedback information received from completed tasks beside the progress of the current running task. Straggler reduce task aggravates the violation of MapReduce job completion time. Straggler reduce task is typically the result of bad data partitioning during the reduce phase. The Hash partitioner employed by Hadoop may cause intermediate data skew, which results in straggler reduce task. In this dissertation a new partitioning scheme, named Balanced Data Clusters Partitioner (BDCP), is proposed to mitigate straggler reduce tasks. BDCP is based on sampling of input data and feedback information about the current processing task. BDCP can assist in straggler mitigation during the reduce phase and minimize the job completion time in MapReduce jobs. The results of extensive experiments corroborate that the algorithms and policies proposed in this dissertation can improve the performance of data-intensive applications running on cloud platforms

    Optimal Replica Placement in Tree Networks with QoS and Bandwidth Constraints and the Closest Allocation Policy

    Get PDF
    This paper deals with the replica placement problem on fully homogeneous tree networks known as the Replica Placement optimization problem. The client requests are known beforehand, while the number and location of the servers are to be determined. We investigate the latter problem using the Closest access policy when adding QoS and bandwidth constraints. We propose an optimal algorithm in two passes using dynamic programming

    A Lightweight Distributed Solution to Content Replication in Mobile Networks

    Full text link
    Performance and reliability of content access in mobile networks is conditioned by the number and location of content replicas deployed at the network nodes. Facility location theory has been the traditional, centralized approach to study content replication: computing the number and placement of replicas in a network can be cast as an uncapacitated facility location problem. The endeavour of this work is to design a distributed, lightweight solution to the above joint optimization problem, while taking into account the network dynamics. In particular, we devise a mechanism that lets nodes share the burden of storing and providing content, so as to achieve load balancing, and decide whether to replicate or drop the information so as to adapt to a dynamic content demand and time-varying topology. We evaluate our mechanism through simulation, by exploring a wide range of settings and studying realistic content access mechanisms that go beyond the traditional assumptionmatching demand points to their closest content replica. Results show that our mechanism, which uses local measurements only, is: (i) extremely precise in approximating an optimal solution to content placement and replication; (ii) robust against network mobility; (iii) flexible in accommodating various content access patterns, including variation in time and space of the content demand.Comment: 12 page
    corecore