5,728 research outputs found

    zCap: a zero configuration adaptive paging and mobility management mechanism

    Get PDF
    Today, cellular networks rely on fixed collections of cells (tracking areas) for user equipment localisation. Locating users within these areas involves broadcast search (paging), which consumes radio bandwidth but reduces the user equipment signalling required for mobility management. Tracking areas are today manually configured, hard to adapt to local mobility and influence the load on several key resources in the network. We propose a decentralised and self-adaptive approach to mobility management based on a probabilistic model of local mobility. By estimating the parameters of this model from observations of user mobility collected online, we obtain a dynamic model from which we construct local neighbourhoods of cells where we are most likely to locate user equipment. We propose to replace the static tracking areas of current systems with neighbourhoods local to each cell. The model is also used to derive a multi-phase paging scheme, where the division of neighbourhood cells into consecutive phases balances response times and paging cost. The complete mechanism requires no manual tracking area configuration and performs localisation efficiently in terms of signalling and response times. Detailed simulations show that significant potential gains in localisation effi- ciency are possible while eliminating manual configuration of mobility management parameters. Variants of the proposal can be implemented within current (LTE) standards

    A scheme for supporting distributed data structures on multicomputers

    Get PDF
    A data migration mechanism is proposed that allows an explicit and controlled mapping of data to memory. While read or write copies of each data element can be assigned to any processor's memory, longer term storage of each data element is assigned to a specific location in the memory of a particular processor. The proposed integration of a data migration scheme with a compiler is able to eliminate the migration of unneeded data that can occur in multiprocessor paging or caching. The overhead of adjudicating multiple concurrent writes to the same page or cache line is also eliminated. Data is presented that suggests that the scheme may be a pratical method for efficiently supporting data migration

    LightBox: Full-stack Protected Stateful Middlebox at Lightning Speed

    Full text link
    Running off-site software middleboxes at third-party service providers has been a popular practice. However, routing large volumes of raw traffic, which may carry sensitive information, to a remote site for processing raises severe security concerns. Prior solutions often abstract away important factors pertinent to real-world deployment. In particular, they overlook the significance of metadata protection and stateful processing. Unprotected traffic metadata like low-level headers, size and count, can be exploited to learn supposedly encrypted application contents. Meanwhile, tracking the states of 100,000s of flows concurrently is often indispensable in production-level middleboxes deployed at real networks. We present LightBox, the first system that can drive off-site middleboxes at near-native speed with stateful processing and the most comprehensive protection to date. Built upon commodity trusted hardware, Intel SGX, LightBox is the product of our systematic investigation of how to overcome the inherent limitations of secure enclaves using domain knowledge and customization. First, we introduce an elegant virtual network interface that allows convenient access to fully protected packets at line rate without leaving the enclave, as if from the trusted source network. Second, we provide complete flow state management for efficient stateful processing, by tailoring a set of data structures and algorithms optimized for the highly constrained enclave space. Extensive evaluations demonstrate that LightBox, with all security benefits, can achieve 10Gbps packet I/O, and that with case studies on three stateful middleboxes, it can operate at near-native speed.Comment: Accepted at ACM CCS 201

    An Associativity Threshold Phenomenon in Set-Associative Caches

    Full text link
    In an α\alpha-way set-associative cache, the cache is partitioned into disjoint sets of size α\alpha, and each item can only be cached in one set, typically selected via a hash function. Set-associative caches are widely used and have many benefits, e.g., in terms of latency or concurrency, over fully associative caches, but they often incur more cache misses. As the set size α\alpha decreases, the benefits increase, but the paging costs worsen. In this paper we characterize the performance of an α\alpha-way set-associative LRU cache of total size kk, as a function of α=α(k)\alpha = \alpha(k). We prove the following, assuming that sets are selected using a fully random hash function: - For α=ω(logk)\alpha = \omega(\log k), the paging cost of an α\alpha-way set-associative LRU cache is within additive O(1)O(1) of that a fully-associative LRU cache of size (1o(1))k(1-o(1))k, with probability 11/poly(k)1 - 1/\operatorname{poly}(k), for all request sequences of length poly(k)\operatorname{poly}(k). - For α=o(logk)\alpha = o(\log k), and for all c=O(1)c = O(1) and r=O(1)r = O(1), the paging cost of an α\alpha-way set-associative LRU cache is not within a factor cc of that a fully-associative LRU cache of size k/rk/r, for some request sequence of length O(k1.01)O(k^{1.01}). - For α=ω(logk)\alpha = \omega(\log k), if the hash function can be occasionally changed, the paging cost of an α\alpha-way set-associative LRU cache is within a factor 1+o(1)1 + o(1) of that a fully-associative LRU cache of size (1o(1))k(1-o(1))k, with probability 11/poly(k)1 - 1/\operatorname{poly}(k), for request sequences of arbitrary (e.g., super-polynomial) length. Some of our results generalize to other paging algorithms besides LRU, such as least-frequently used (LFU)

    Contextual Bandit Modeling for Dynamic Runtime Control in Computer Systems

    Get PDF
    Modern operating systems and microarchitectures provide a myriad of mechanisms for monitoring and affecting system operation and resource utilization at runtime. Dynamic runtime control of these mechanisms can tailor system operation to the characteristics and behavior of the current workload, resulting in improved performance. However, developing effective models for system control can be challenging. Existing methods often require extensive manual effort, computation time, and domain knowledge to identify relevant low-level performance metrics, relate low-level performance metrics and high-level control decisions to workload performance, and to evaluate the resulting control models. This dissertation develops a general framework, based on the contextual bandit, for describing and learning effective models for runtime system control. Random profiling is used to characterize the relationship between workload behavior, system configuration, and performance. The framework is evaluated in the context of two applications of progressive complexity; first, the selection of paging modes (Shadow Paging, Hardware-Assisted Page) in the Xen virtual machine memory manager; second, the utilization of hardware memory prefetching for multi-core, multi-tenant workloads with cross-core contention for shared memory resources, such as the last-level cache and memory bandwidth. The resulting models for both applications are competitive in comparison to existing runtime control approaches. For paging mode selection, the resulting model provides equivalent performance to the state of the art while substantially reducing the computation requirements of profiling. For hardware memory prefetcher utilization, the resulting models are the first to provide dynamic control for hardware prefetchers using workload statistics. Finally, a correlation-based feature selection method is evaluated for identifying relevant low-level performance metrics related to hardware memory prefetching

    POET: Training Neural Networks on Tiny Devices with Integrated Rematerialization and Paging

    Full text link
    Fine-tuning models on edge devices like mobile phones would enable privacy-preserving personalization over sensitive data. However, edge training has historically been limited to relatively small models with simple architectures because training is both memory and energy intensive. We present POET, an algorithm to enable training large neural networks on memory-scarce battery-operated edge devices. POET jointly optimizes the integrated search search spaces of rematerialization and paging, two algorithms to reduce the memory consumption of backpropagation. Given a memory budget and a run-time constraint, we formulate a mixed-integer linear program (MILP) for energy-optimal training. Our approach enables training significantly larger models on embedded devices while reducing energy consumption while not modifying mathematical correctness of backpropagation. We demonstrate that it is possible to fine-tune both ResNet-18 and BERT within the memory constraints of a Cortex-M class embedded device while outperforming current edge training methods in energy efficiency. POET is an open-source project available at https://github.com/ShishirPatil/poetComment: Proceedings of the 39th International Conference on Machine Learning 2022 (ICML 2022

    VIRTUAL MEMORY ON A MANY-CORE NOC

    Get PDF
    Many-core devices are likely to become increasingly common in real-time and embedded systems as computational demands grow and as expectations for higher performance can generally only be met by by increasing core numbers rather than relying on higher clock speeds. Network-on-chip devices, where multiple cores share a single slice of silicon and employ packetised communications, are a widely-deployed many-core option for system designers. As NoCs are expected to run larger and more complex programs, the small amount of fast, on-chip memory available to each core is unlikely to be sufficient for all but the simplest of tasks, and it is necessary to find an efficient, effective, and time-bounded, means of accessing resources stored in off-chip memory, such as DRAM or Flash storage. The abstraction of paged virtual memory is a familiar technique to manage similar tasks in general computing but has often been shunned by real-time developers because of concern about time predictability. We show it can be a poor choice for a many-core NoC system as, unmodified, it typically uses page sizes optimised for interaction with spinning disks and not solid state media, and transports significant volumes of subsequently unused data across already congested links. In this work we outline and simulate an efficient partial paging algorithm where only those memory resources that are locally accessed are transported between global and local storage. We further show that smaller page sizes add to efficiency. We examine the factors that lead to timing delays in such systems, and show we can predict worst case execution times at even safety-critical thresholds by using statistical methods from extreme value theory. We also show these results are applicable to systems with a variety of connections to memory
    corecore