5,728 research outputs found
zCap: a zero configuration adaptive paging and mobility management mechanism
Today, cellular networks rely on fixed collections of cells (tracking areas) for user equipment localisation. Locating users within these areas involves broadcast search (paging), which consumes radio bandwidth but reduces the user equipment signalling required for mobility management. Tracking areas are today manually configured, hard to adapt to local mobility and influence the load on several key resources in the network. We propose a decentralised and self-adaptive approach to mobility management based on a probabilistic model of local mobility. By estimating the parameters of this model from observations of user mobility collected online, we obtain a dynamic model from which we construct local neighbourhoods of cells where we are most likely to locate user equipment. We propose to replace the static tracking areas of current systems with neighbourhoods local to each cell. The model is also used to derive a multi-phase paging scheme, where the division of neighbourhood cells into consecutive phases balances response times and paging cost. The complete mechanism requires no manual tracking area configuration and performs localisation efficiently in terms of signalling and response times. Detailed simulations show that significant potential gains in localisation effi- ciency are possible while eliminating manual configuration of mobility management parameters. Variants of the proposal can be implemented within current (LTE) standards
A scheme for supporting distributed data structures on multicomputers
A data migration mechanism is proposed that allows an explicit and controlled mapping of data to memory. While read or write copies of each data element can be assigned to any processor's memory, longer term storage of each data element is assigned to a specific location in the memory of a particular processor. The proposed integration of a data migration scheme with a compiler is able to eliminate the migration of unneeded data that can occur in multiprocessor paging or caching. The overhead of adjudicating multiple concurrent writes to the same page or cache line is also eliminated. Data is presented that suggests that the scheme may be a pratical method for efficiently supporting data migration
Recommended from our members
Technical Review of Residential Programmable Communicating Thermostat Implementation for Title 24-2008
LightBox: Full-stack Protected Stateful Middlebox at Lightning Speed
Running off-site software middleboxes at third-party service providers has
been a popular practice. However, routing large volumes of raw traffic, which
may carry sensitive information, to a remote site for processing raises severe
security concerns. Prior solutions often abstract away important factors
pertinent to real-world deployment. In particular, they overlook the
significance of metadata protection and stateful processing. Unprotected
traffic metadata like low-level headers, size and count, can be exploited to
learn supposedly encrypted application contents. Meanwhile, tracking the states
of 100,000s of flows concurrently is often indispensable in production-level
middleboxes deployed at real networks.
We present LightBox, the first system that can drive off-site middleboxes at
near-native speed with stateful processing and the most comprehensive
protection to date. Built upon commodity trusted hardware, Intel SGX, LightBox
is the product of our systematic investigation of how to overcome the inherent
limitations of secure enclaves using domain knowledge and customization. First,
we introduce an elegant virtual network interface that allows convenient access
to fully protected packets at line rate without leaving the enclave, as if from
the trusted source network. Second, we provide complete flow state management
for efficient stateful processing, by tailoring a set of data structures and
algorithms optimized for the highly constrained enclave space. Extensive
evaluations demonstrate that LightBox, with all security benefits, can achieve
10Gbps packet I/O, and that with case studies on three stateful middleboxes, it
can operate at near-native speed.Comment: Accepted at ACM CCS 201
An Associativity Threshold Phenomenon in Set-Associative Caches
In an -way set-associative cache, the cache is partitioned into
disjoint sets of size , and each item can only be cached in one set,
typically selected via a hash function. Set-associative caches are widely used
and have many benefits, e.g., in terms of latency or concurrency, over fully
associative caches, but they often incur more cache misses. As the set size
decreases, the benefits increase, but the paging costs worsen.
In this paper we characterize the performance of an -way
set-associative LRU cache of total size , as a function of . We prove the following, assuming that sets are selected using a
fully random hash function:
- For , the paging cost of an -way
set-associative LRU cache is within additive of that a fully-associative
LRU cache of size , with probability ,
for all request sequences of length .
- For , and for all and , the paging
cost of an -way set-associative LRU cache is not within a factor of
that a fully-associative LRU cache of size , for some request sequence of
length .
- For , if the hash function can be occasionally
changed, the paging cost of an -way set-associative LRU cache is within
a factor of that a fully-associative LRU cache of size ,
with probability , for request sequences of
arbitrary (e.g., super-polynomial) length.
Some of our results generalize to other paging algorithms besides LRU, such
as least-frequently used (LFU)
Contextual Bandit Modeling for Dynamic Runtime Control in Computer Systems
Modern operating systems and microarchitectures provide a myriad of mechanisms for monitoring and affecting system operation and resource utilization at runtime. Dynamic runtime control of these mechanisms can tailor system operation to the characteristics and behavior of the current workload, resulting in improved performance. However, developing effective models for system control can be challenging. Existing methods often require extensive manual effort, computation time, and domain knowledge to identify relevant low-level performance metrics, relate low-level performance metrics and high-level control decisions to workload performance, and to evaluate the resulting control models.
This dissertation develops a general framework, based on the contextual bandit, for describing and learning effective models for runtime system control. Random profiling is used to characterize the relationship between workload behavior, system configuration, and performance. The framework is evaluated in the context of two applications of progressive complexity; first, the selection of paging modes (Shadow Paging, Hardware-Assisted Page) in the Xen virtual machine memory manager; second, the utilization of hardware memory prefetching for multi-core, multi-tenant workloads with cross-core contention for shared memory resources, such as the last-level cache and memory bandwidth. The resulting models for both applications are competitive in comparison to existing runtime control approaches. For paging mode selection, the resulting model provides equivalent performance to the state of the art while substantially reducing the computation requirements of profiling. For hardware memory prefetcher utilization, the resulting models are the first to provide dynamic control for hardware prefetchers using workload statistics. Finally, a correlation-based feature selection method is evaluated for identifying relevant low-level performance metrics related to hardware memory prefetching
POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging
Fine-tuning models on edge devices like mobile phones would enable
privacy-preserving personalization over sensitive data. However, edge training
has historically been limited to relatively small models with simple
architectures because training is both memory and energy intensive. We present
POET, an algorithm to enable training large neural networks on memory-scarce
battery-operated edge devices. POET jointly optimizes the integrated search
search spaces of rematerialization and paging, two algorithms to reduce the
memory consumption of backpropagation. Given a memory budget and a run-time
constraint, we formulate a mixed-integer linear program (MILP) for
energy-optimal training. Our approach enables training significantly larger
models on embedded devices while reducing energy consumption while not
modifying mathematical correctness of backpropagation. We demonstrate that it
is possible to fine-tune both ResNet-18 and BERT within the memory constraints
of a Cortex-M class embedded device while outperforming current edge training
methods in energy efficiency. POET is an open-source project available at
https://github.com/ShishirPatil/poetComment: Proceedings of the 39th International Conference on Machine Learning
2022 (ICML 2022
VIRTUAL MEMORY ON A MANY-CORE NOC
Many-core devices are likely to become increasingly common in real-time and embedded systems as computational demands grow and as expectations for higher performance can generally only be met by by increasing core numbers rather than relying on higher clock speeds.
Network-on-chip devices, where multiple cores share a single slice of silicon and employ packetised communications, are a widely-deployed many-core option for system designers. As NoCs are expected to run larger and more complex programs, the small amount of fast, on-chip memory available to each core is unlikely to be sufficient for all but the simplest of tasks, and it is necessary to find an efficient, effective, and time-bounded, means of accessing resources stored in off-chip memory, such as DRAM or Flash storage.
The abstraction of paged virtual memory is a familiar technique to manage similar tasks in general computing but has often been shunned by real-time developers because of concern about time predictability. We show it can be a poor choice for a many-core NoC system as, unmodified, it typically uses page sizes optimised for interaction with spinning disks and not solid state media, and transports significant volumes of subsequently unused data across already congested links.
In this work we outline and simulate an efficient partial paging algorithm where only those memory resources that are locally accessed are transported between global and local storage. We further show that smaller page sizes add to efficiency. We examine the factors that lead to timing delays in such systems, and show we can predict worst case execution times at even safety-critical thresholds by using statistical methods from extreme value theory. We also show these results are applicable to systems with a variety of connections to memory
- …