32,116 research outputs found
Adaptive Replication in Distributed Content Delivery Networks
We address the problem of content replication in large distributed content
delivery networks, composed of a data center assisted by many small servers
with limited capabilities and located at the edge of the network. The objective
is to optimize the placement of contents on the servers to offload as much as
possible the data center. We model the system constituted by the small servers
as a loss network, each loss corresponding to a request to the data center.
Based on large system / storage behavior, we obtain an asymptotic formula for
the optimal replication of contents and propose adaptive schemes related to
those encountered in cache networks but reacting here to loss events, and
faster algorithms generating virtual events at higher rate while keeping the
same target replication. We show through simulations that our adaptive schemes
outperform significantly standard replication strategies both in terms of loss
rates and adaptation speed.Comment: 10 pages, 5 figure
Sparse Allreduce: Efficient Scalable Communication for Power-Law Data
Many large datasets exhibit power-law statistics: The web graph, social
networks, text data, click through data etc. Their adjacency graphs are termed
natural graphs, and are known to be difficult to partition. As a consequence
most distributed algorithms on these graphs are communication intensive. Many
algorithms on natural graphs involve an Allreduce: a sum or average of
partitioned data which is then shared back to the cluster nodes. Examples
include PageRank, spectral partitioning, and many machine learning algorithms
including regression, factor (topic) models, and clustering. In this paper we
describe an efficient and scalable Allreduce primitive for power-law data. We
point out scaling problems with existing butterfly and round-robin networks for
Sparse Allreduce, and show that a hybrid approach improves on both.
Furthermore, we show that Sparse Allreduce stages should be nested instead of
cascaded (as in the dense case). And that the optimum throughput Allreduce
network should be a butterfly of heterogeneous degree where degree decreases
with depth into the network. Finally, a simple replication scheme is introduced
to deal with node failures. We present experiments showing significant
improvements over existing systems such as PowerGraph and Hadoop
Optimal Data Placement on Networks With Constant Number of Clients
We introduce optimal algorithms for the problems of data placement (DP) and
page placement (PP) in networks with a constant number of clients each of which
has limited storage availability and issues requests for data objects. The
objective for both problems is to efficiently utilize each client's storage
(deciding where to place replicas of objects) so that the total incurred access
and installation cost over all clients is minimized. In the PP problem an extra
constraint on the maximum number of clients served by a single client must be
satisfied. Our algorithms solve both problems optimally when all objects have
uniform lengths. When objects lengths are non-uniform we also find the optimal
solution, albeit a small, asymptotically tight violation of each client's
storage size by lmax where lmax is the maximum length of the objects
and some arbitrarily small positive constant. We make no assumption
on the underlying topology of the network (metric, ultrametric etc.), thus
obtaining the first non-trivial results for non-metric data placement problems
- …