1,726 research outputs found
Online Coded Caching
We consider a basic content distribution scenario consisting of a single
origin server connected through a shared bottleneck link to a number of users
each equipped with a cache of finite memory. The users issue a sequence of
content requests from a set of popular files, and the goal is to operate the
caches as well as the server such that these requests are satisfied with the
minimum number of bits sent over the shared link. Assuming a basic Markov model
for renewing the set of popular files, we characterize approximately the
optimal long-term average rate of the shared link. We further prove that the
optimal online scheme has approximately the same performance as the optimal
offline scheme, in which the cache contents can be updated based on the entire
set of popular files before each new request. To support these theoretical
results, we propose an online coded caching scheme termed coded least-recently
sent (LRS) and simulate it for a demand time series derived from the dataset
made available by Netflix for the Netflix Prize. For this time series, we show
that the proposed coded LRS algorithm significantly outperforms the popular
least-recently used (LRU) caching algorithm.Comment: 15 page
Mobile IP: state of the art report
Due to roaming, a mobile device may change its network attachment each time it moves to a new link. This might cause a disruption for the Internet data packets that have to reach the mobile node. Mobile IP is a protocol, developed by the Mobile IP Internet Engineering Task Force (IETF) working group, that is able to inform the network about this change in network attachment such that the Internet data packets will be delivered in a seamless way to the new point of attachment. This document presents current developments and research activities in the Mobile IP area
The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework
Computers continue to diversify with respect to system designs, emerging
memory technologies, and application memory demands. Unfortunately, continually
adapting the conventional virtual memory framework to each possible system
configuration is challenging, and often results in performance loss or requires
non-trivial workarounds. To address these challenges, we propose a new virtual
memory framework, the Virtual Block Interface (VBI). We design VBI based on the
key idea that delegating memory management duties to hardware can reduce the
overheads and software complexity associated with virtual memory. VBI
introduces a set of variable-sized virtual blocks (VBs) to applications. Each
VB is a contiguous region of the globally-visible VBI address space, and an
application can allocate each semantically meaningful unit of information
(e.g., a data structure) in a separate VB. VBI decouples access protection from
memory allocation and address translation. While the OS controls which programs
have access to which VBs, dedicated hardware in the memory controller manages
the physical memory allocation and address translation of the VBs. This
approach enables several architectural optimizations to (1) efficiently and
flexibly cater to different and increasingly diverse system configurations, and
(2) eliminate key inefficiencies of conventional virtual memory. We demonstrate
the benefits of VBI with two important use cases: (1) reducing the overheads of
address translation (for both native execution and virtual machine
environments), as VBI reduces the number of translation requests and associated
memory accesses; and (2) two heterogeneous main memory architectures, where VBI
increases the effectiveness of managing fast memory regions. For both cases,
VBI significanttly improves performance over conventional virtual memory
Near-Memory Address Translation
Memory and logic integration on the same chip is becoming increasingly cost
effective, creating the opportunity to offload data-intensive functionality to
processing units placed inside memory chips. The introduction of memory-side
processing units (MPUs) into conventional systems faces virtual memory as the
first big showstopper: without efficient hardware support for address
translation MPUs have highly limited applicability. Unfortunately, conventional
translation mechanisms fall short of providing fast translations as
contemporary memories exceed the reach of TLBs, making expensive page walks
common.
In this paper, we are the first to show that the historically important
flexibility to map any virtual page to any page frame is unnecessary in today's
servers. We find that while limiting the associativity of the
virtual-to-physical mapping incurs no penalty, it can break the
translate-then-fetch serialization if combined with careful data placement in
the MPU's memory, allowing for translation and data fetch to proceed
independently and in parallel. We propose the Distributed Inverted Page Table
(DIPTA), a near-memory structure in which the smallest memory partition keeps
the translation information for its data share, ensuring that the translation
completes together with the data fetch. DIPTA completely eliminates the
performance overhead of translation, achieving speedups of up to 3.81x and
2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure
Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency
Persistent memory provides high-performance data persistence at main memory.
Memory writes need to be performed in strict order to satisfy storage
consistency requirements and enable correct recovery from system crashes.
Unfortunately, adhering to such a strict order significantly degrades system
performance and persistent memory endurance. This paper introduces a new
mechanism, Loose-Ordering Consistency (LOC), that satisfies the ordering
requirements at significantly lower performance and endurance loss. LOC
consists of two key techniques. First, Eager Commit eliminates the need to
perform a persistent commit record write within a transaction. We do so by
ensuring that we can determine the status of all committed transactions during
recovery by storing necessary metadata information statically with blocks of
data written to memory. Second, Speculative Persistence relaxes the write
ordering between transactions by allowing writes to be speculatively written to
persistent memory. A speculative write is made visible to software only after
its associated transaction commits. To enable this, our mechanism supports the
tracking of committed transaction ID and multi-versioning in the CPU cache. Our
evaluations show that LOC reduces the average performance overhead of memory
persistence from 66.9% to 34.9% and the memory write traffic overhead from
17.1% to 3.4% on a variety of workloads.Comment: This paper has been accepted by IEEE Transactions on Parallel and
Distributed System
- …