2,477 research outputs found

    Optimal Data Placement on Networks With Constant Number of Clients

    Full text link
    We introduce optimal algorithms for the problems of data placement (DP) and page placement (PP) in networks with a constant number of clients each of which has limited storage availability and issues requests for data objects. The objective for both problems is to efficiently utilize each client's storage (deciding where to place replicas of objects) so that the total incurred access and installation cost over all clients is minimized. In the PP problem an extra constraint on the maximum number of clients served by a single client must be satisfied. Our algorithms solve both problems optimally when all objects have uniform lengths. When objects lengths are non-uniform we also find the optimal solution, albeit a small, asymptotically tight violation of each client's storage size by ϵ\epsilonlmax where lmax is the maximum length of the objects and ϵ\epsilon some arbitrarily small positive constant. We make no assumption on the underlying topology of the network (metric, ultrametric etc.), thus obtaining the first non-trivial results for non-metric data placement problems

    DCCast: Efficient Point to Multipoint Transfers Across Datacenters

    Full text link
    Using multiple datacenters allows for higher availability, load balancing and reduced latency to customers of cloud services. To distribute multiple copies of data, cloud providers depend on inter-datacenter WANs that ought to be used efficiently considering their limited capacity and the ever-increasing data demands. In this paper, we focus on applications that transfer objects from one datacenter to several datacenters over dedicated inter-datacenter networks. We present DCCast, a centralized Point to Multi-Point (P2MP) algorithm that uses forwarding trees to efficiently deliver an object from a source datacenter to required destination datacenters. With low computational overhead, DCCast selects forwarding trees that minimize bandwidth usage and balance load across all links. With simulation experiments on Google's GScale network, we show that DCCast can reduce total bandwidth usage and tail Transfer Completion Times (TCT) by up to 50%50\% compared to delivering the same objects via independent point-to-point (P2P) transfers.Comment: 9th USENIX Workshop on Hot Topics in Cloud Computing, https://www.usenix.org/conference/hotcloud17/program/presentation/noormohammadpou

    QuickCast: Fast and Efficient Inter-Datacenter Transfers using Forwarding Tree Cohorts

    Full text link
    Large inter-datacenter transfers are crucial for cloud service efficiency and are increasingly used by organizations that have dedicated wide area networks between datacenters. A recent work uses multicast forwarding trees to reduce the bandwidth needs and improve completion times of point-to-multipoint transfers. Using a single forwarding tree per transfer, however, leads to poor performance because the slowest receiver dictates the completion time for all receivers. Using multiple forwarding trees per transfer alleviates this concern--the average receiver could finish early; however, if done naively, bandwidth usage would also increase and it is apriori unclear how best to partition receivers, how to construct the multiple trees and how to determine the rate and schedule of flows on these trees. This paper presents QuickCast, a first solution to these problems. Using simulations on real-world network topologies, we see that QuickCast can speed up the average receiver's completion time by as much as 10×10\times while only using 1.04×1.04\times more bandwidth; further, the completion time for all receivers also improves by as much as 1.6×1.6\times faster at high loads.Comment: [Extended Version] Accepted for presentation in IEEE INFOCOM 2018, Honolulu, H

    Minimum cost mirror sites using network coding: Replication vs. coding at the source nodes

    Get PDF
    Content distribution over networks is often achieved by using mirror sites that hold copies of files or portions thereof to avoid congestion and delay issues arising from excessive demands to a single location. Accordingly, there are distributed storage solutions that divide the file into pieces and place copies of the pieces (replication) or coded versions of the pieces (coding) at multiple source nodes. We consider a network which uses network coding for multicasting the file. There is a set of source nodes that contains either subsets or coded versions of the pieces of the file. The cost of a given storage solution is defined as the sum of the storage cost and the cost of the flows required to support the multicast. Our interest is in finding the storage capacities and flows at minimum combined cost. We formulate the corresponding optimization problems by using the theory of information measures. In particular, we show that when there are two source nodes, there is no loss in considering subset sources. For three source nodes, we derive a tight upper bound on the cost gap between the coded and uncoded cases. We also present algorithms for determining the content of the source nodes.Comment: IEEE Trans. on Information Theory (to appear), 201

    On the Communication Cost of MDS Erasure Codes in Distributed Storage Systems

    Get PDF
    Distributed storage systems store some redundant data to keep the degree of availability of the stored data constant and also to increase the system's resistance against failures. This type of systems usually use pure replication or methods based on RAID systems as redundancy schemes. In this paper, we study the communication cost of a distributed data storage system using Maximum Distance Separable (MDS) erasure codes. Our focus is reduction of the cost of one-to-many communication used in data reconstruction/repair initialization and update operations. We propose the use of two different communication approaches on the area of distributed storage systems for the above operations; Steiner tree approach and multi-shortest path approach. We also analyse these two communication approaches empirically and theoretically. Our theoretical results indicate that Steiner tree approach has lower message usage, whereas, multi-shortest path approach has lower time usage for data reconstruction/repair initialization operations. On the other hand, Steiner tree approach has better message and time metrics for the data update process. Furthermore, our experimental results support these theoretical results. Thus, users can choose between the two approaches depending on their needs and priorities
    corecore