2,763 research outputs found
A Low-Complexity Approach to Distributed Cooperative Caching with Geographic Constraints
We consider caching in cellular networks in which each base station is
equipped with a cache that can store a limited number of files. The popularity
of the files is known and the goal is to place files in the caches such that
the probability that a user at an arbitrary location in the plane will find the
file that she requires in one of the covering caches is maximized.
We develop distributed asynchronous algorithms for deciding which contents to
store in which cache. Such cooperative algorithms require communication only
between caches with overlapping coverage areas and can operate in asynchronous
manner. The development of the algorithms is principally based on an
observation that the problem can be viewed as a potential game. Our basic
algorithm is derived from the best response dynamics. We demonstrate that the
complexity of each best response step is independent of the number of files,
linear in the cache capacity and linear in the maximum number of base stations
that cover a certain area. Then, we show that the overall algorithm complexity
for a discrete cache placement is polynomial in both network size and catalog
size. In practical examples, the algorithm converges in just a few iterations.
Also, in most cases of interest, the basic algorithm finds the best Nash
equilibrium corresponding to the global optimum. We provide two extensions of
our basic algorithm based on stochastic and deterministic simulated annealing
which find the global optimum.
Finally, we demonstrate the hit probability evolution on real and synthetic
networks numerically and show that our distributed caching algorithm performs
significantly better than storing the most popular content, probabilistic
content placement policy and Multi-LRU caching policies.Comment: 24 pages, 9 figures, presented at SIGMETRICS'1
RDGC: A Reuse Distance-Based Approach to GPU Cache Performance Analysis
In the present paper, we propose RDGC, a reuse distance-based performance analysis approach for GPU cache hierarchy. RDGC models the thread-level parallelism in GPUs to generate appropriate cache reference sequence. Further, reuse distance analysis is extended to model the multi-partition/multi-port parallel caches and employed by RDGC to analyze GPU cache memories. RDGC can be utilized for architectural space exploration and parallel application development through providing hit ratios and transaction counts. The results of the present study demonstrate that the proposed model has an average error of 3.72 % and 4.5 % (for L1 and L2 hit ratios, respectively). The results also indicate that the slowdown of RDGC is equal to 47 000 times compared to hardware execution, while it is 59 times faster than GPGPU-Sim simulator
Jointly Optimal Routing and Caching for Arbitrary Network Topologies
We study a problem of fundamental importance to ICNs, namely, minimizing
routing costs by jointly optimizing caching and routing decisions over an
arbitrary network topology. We consider both source routing and hop-by-hop
routing settings. The respective offline problems are NP-hard. Nevertheless, we
show that there exist polynomial time approximation algorithms producing
solutions within a constant approximation from the optimal. We also produce
distributed, adaptive algorithms with the same approximation guarantees. We
simulate our adaptive algorithms over a broad array of different topologies.
Our algorithms reduce routing costs by several orders of magnitude compared to
prior art, including algorithms optimizing caching under fixed routing.Comment: This is the extended version of the paper "Jointly Optimal Routing
and Caching for Arbitrary Network Topologies", appearing in the 4th ACM
Conference on Information-Centric Networking (ICN 2017), Berlin, Sep. 26-28,
201
Automatic Loop Kernel Analysis and Performance Modeling With Kerncraft
Analytic performance models are essential for understanding the performance
characteristics of loop kernels, which consume a major part of CPU cycles in
computational science. Starting from a validated performance model one can
infer the relevant hardware bottlenecks and promising optimization
opportunities. Unfortunately, analytic performance modeling is often tedious
even for experienced developers since it requires in-depth knowledge about the
hardware and how it interacts with the software. We present the "Kerncraft"
tool, which eases the construction of analytic performance models for streaming
kernels and stencil loop nests. Starting from the loop source code, the problem
size, and a description of the underlying hardware, Kerncraft can ideally
predict the single-core performance and scaling behavior of loops on multicore
processors using the Roofline or the Execution-Cache-Memory (ECM) model. We
describe the operating principles of Kerncraft with its capabilities and
limitations, and we show how it may be used to quickly gain insights by
accelerated analytic modeling.Comment: 11 pages, 4 figures, 8 listing
- …