2,477 research outputs found
Optimal Data Placement on Networks With Constant Number of Clients
We introduce optimal algorithms for the problems of data placement (DP) and
page placement (PP) in networks with a constant number of clients each of which
has limited storage availability and issues requests for data objects. The
objective for both problems is to efficiently utilize each client's storage
(deciding where to place replicas of objects) so that the total incurred access
and installation cost over all clients is minimized. In the PP problem an extra
constraint on the maximum number of clients served by a single client must be
satisfied. Our algorithms solve both problems optimally when all objects have
uniform lengths. When objects lengths are non-uniform we also find the optimal
solution, albeit a small, asymptotically tight violation of each client's
storage size by lmax where lmax is the maximum length of the objects
and some arbitrarily small positive constant. We make no assumption
on the underlying topology of the network (metric, ultrametric etc.), thus
obtaining the first non-trivial results for non-metric data placement problems
DCCast: Efficient Point to Multipoint Transfers Across Datacenters
Using multiple datacenters allows for higher availability, load balancing and
reduced latency to customers of cloud services. To distribute multiple copies
of data, cloud providers depend on inter-datacenter WANs that ought to be used
efficiently considering their limited capacity and the ever-increasing data
demands. In this paper, we focus on applications that transfer objects from one
datacenter to several datacenters over dedicated inter-datacenter networks. We
present DCCast, a centralized Point to Multi-Point (P2MP) algorithm that uses
forwarding trees to efficiently deliver an object from a source datacenter to
required destination datacenters. With low computational overhead, DCCast
selects forwarding trees that minimize bandwidth usage and balance load across
all links. With simulation experiments on Google's GScale network, we show that
DCCast can reduce total bandwidth usage and tail Transfer Completion Times
(TCT) by up to compared to delivering the same objects via independent
point-to-point (P2P) transfers.Comment: 9th USENIX Workshop on Hot Topics in Cloud Computing,
https://www.usenix.org/conference/hotcloud17/program/presentation/noormohammadpou
QuickCast: Fast and Efficient Inter-Datacenter Transfers using Forwarding Tree Cohorts
Large inter-datacenter transfers are crucial for cloud service efficiency and
are increasingly used by organizations that have dedicated wide area networks
between datacenters. A recent work uses multicast forwarding trees to reduce
the bandwidth needs and improve completion times of point-to-multipoint
transfers. Using a single forwarding tree per transfer, however, leads to poor
performance because the slowest receiver dictates the completion time for all
receivers. Using multiple forwarding trees per transfer alleviates this
concern--the average receiver could finish early; however, if done naively,
bandwidth usage would also increase and it is apriori unclear how best to
partition receivers, how to construct the multiple trees and how to determine
the rate and schedule of flows on these trees. This paper presents QuickCast, a
first solution to these problems. Using simulations on real-world network
topologies, we see that QuickCast can speed up the average receiver's
completion time by as much as while only using more
bandwidth; further, the completion time for all receivers also improves by as
much as faster at high loads.Comment: [Extended Version] Accepted for presentation in IEEE INFOCOM 2018,
Honolulu, H
Minimum cost mirror sites using network coding: Replication vs. coding at the source nodes
Content distribution over networks is often achieved by using mirror sites
that hold copies of files or portions thereof to avoid congestion and delay
issues arising from excessive demands to a single location. Accordingly, there
are distributed storage solutions that divide the file into pieces and place
copies of the pieces (replication) or coded versions of the pieces (coding) at
multiple source nodes. We consider a network which uses network coding for
multicasting the file. There is a set of source nodes that contains either
subsets or coded versions of the pieces of the file. The cost of a given
storage solution is defined as the sum of the storage cost and the cost of the
flows required to support the multicast. Our interest is in finding the storage
capacities and flows at minimum combined cost. We formulate the corresponding
optimization problems by using the theory of information measures. In
particular, we show that when there are two source nodes, there is no loss in
considering subset sources. For three source nodes, we derive a tight upper
bound on the cost gap between the coded and uncoded cases. We also present
algorithms for determining the content of the source nodes.Comment: IEEE Trans. on Information Theory (to appear), 201
On the Communication Cost of MDS Erasure Codes in Distributed Storage Systems
Distributed storage systems store some redundant data to keep the degree of availability of the stored data constant and also to increase the system's resistance against failures. This type of systems usually use pure replication or methods based on RAID systems as redundancy schemes. In this paper, we study the communication cost of a distributed data storage system using Maximum Distance Separable (MDS) erasure codes. Our focus is reduction of the cost of one-to-many communication used in data reconstruction/repair initialization and update operations. We propose the use of two different communication approaches on the area of distributed storage systems for the above operations; Steiner tree approach and multi-shortest path approach. We also analyse these two communication approaches empirically and theoretically. Our theoretical results indicate that Steiner tree approach has lower message usage, whereas, multi-shortest path approach has lower time usage for data reconstruction/repair initialization operations. On the other hand, Steiner tree approach has better message and time metrics for the data update process. Furthermore, our experimental results support these theoretical results. Thus, users can choose between the two approaches depending on their needs and priorities
- …