4,296 research outputs found

    Cache Equalizer: A Cache Pressure Aware Block Placement Scheme for Large-Scale Chip Multiprocessors

    Get PDF
    This paper describes Cache Equalizer (CE), a novel distributed cache management scheme for large scale chip multiprocessors (CMPs). Our work is motivated by large asymmetry in cache sets usages. CE decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Temporal pressure at the on-chip last-level cache, is continuously collected at a group (comprised of cache sets) granularity, and periodically recorded at the memory controller to guide the placement process. An incoming block is consequently placed at a cache group that exhibits the minimum pressure. CE provides Quality of Service (QoS) by robustly offering better performance than the baseline shared NUCA cache. Simulation results using a full-system simulator demonstrate that CE outperforms shared NUCA caches by an average of 15.5% and by as much as 28.5% for the benchmark programs we examined. Furthermore, evaluations manifested the outperformance of CE versus related CMP cache designs

    Jigsaw: Scalable Software-Defined Caches (Extended Version)

    Get PDF
    Shared last-level caches, widely used in chip-multiprocessors (CMPs), face two fundamental limitations. First, the latency and energy of shared caches degrade as the system scales up. Second, when multiple workloads share the CMP, they suffer from interference in shared cache accesses. Unfortunately, prior research addressing one issue either ignores or worsens the other: NUCA techniques reduce access latency but are prone to hotspots and interference, and cache partitioning techniques only provide isolation but do not reduce access latency. We present Jigsaw, a technique that jointly addresses the scalability and interference problems of shared caches. Hardware lets software define shares, collections of cache bank partitions that act as virtual caches, and map data to shares. Shares give software full control over both data placement and capacity allocation. Jigsaw implements efficient hardware support for share management, monitoring, and adaptation. We propose novel resource-management algorithms and use them to develop a system-level runtime that leverages Jigsaw to both maximize cache utilization and place data close to where it is used. We evaluate Jigsaw using extensive simulations of 16- and 64-core tiled CMPs. Jigsaw improves performance by up to 2.2x (18% avg) over a conventional shared cache, and significantly outperforms state-of-the-art NUCA and partitioning techniques.This work was supported in part by DARPA PERFECT contract HR0011-13-2-0005 and Quanta Computer

    Parallel stereo vision algorithm

    Get PDF
    Integrating a stereo-photogrammetric robot head into a real-time system requires software solutions that rapidly resolve the stereo correspondence problem. The stereo-matcher presented in this paper uses therefore code parallelisation and was tested on three different processors with x87 and AVX. The results show that a 5mega pixels colour image can be matched in 5,55 seconds or as monochrome in 3,3 seconds

    Dynamic Hierarchical Cache Management for Cloud RAN and Multi- Access Edge Computing in 5G Networks

    Get PDF
    Cloud Radio Access Networks (CRAN) and Multi-Access Edge Computing (MEC) are two of the many emerging technologies that are proposed for 5G mobile networks. CRAN provides scalability, flexibility, and better resource utilization to support the dramatic increase of Internet of Things (IoT) and mobile devices. MEC aims to provide low latency, high bandwidth and real- time access to radio networks. Cloud architecture is built on top of traditional Radio Access Networks (RAN) to bring the idea of CRAN and in MEC, cloud computing services are brought near users to improve the user’s experiences. A cache is added in both CRAN and MEC architectures to speed up the mobile network services. This research focuses on cache management of CRAN and MEC because there is a necessity to manage and utilize this limited cache resource efficiently. First, a new cache management algorithm, H-EXD-AHP (Hierarchical Exponential Decay and Analytical Hierarchy Process), is proposed to improve the existing EXD-AHP algorithm. Next, this paper designs three dynamic cache management algorithms and they are implemented on the proposed algorithm: H-EXD-AHP and an existing algorithm: H-PBPS (Hierarchical Probability Based Popularity Scoring). In these proposed designs, cache sizes of the different Service Level Agreement (SLA) users are adjusted dynamically to meet the guaranteed cache hit rate set for their corresponding SLA users. The minimum guarantee of cache hit rate is for our setting. Net neutrality, prioritized treatment will be in common practice. Finally, performance evaluation results show that these designs achieve the guaranteed cache hit rate for differentiated users according to their SLA

    Fast Differentially Private Matrix Factorization

    Full text link
    Differentially private collaborative filtering is a challenging task, both in terms of accuracy and speed. We present a simple algorithm that is provably differentially private, while offering good performance, using a novel connection of differential privacy to Bayesian posterior sampling via Stochastic Gradient Langevin Dynamics. Due to its simplicity the algorithm lends itself to efficient implementation. By careful systems design and by exploiting the power law behavior of the data to maximize CPU cache bandwidth we are able to generate 1024 dimensional models at a rate of 8.5 million recommendations per second on a single PC
    corecore