34 research outputs found

    A Novel Function Complexity-Based Code Migration Policy for Reducing Power Consumption

    Get PDF
    Embedded system designs have changed greatly owing to rapid developments in both hardware and software technology. Typical design should consider hardware limitations, such as size, weight, or battery capacity. In other words, the designs are heavily dependent on the hardware component. Since hardware can deteriorate and degenerate, hardware-aware software design is needed to achieve power-efficient embedded systems. Studies usually focus on the microprocessor in terms of optimizing power consumption. Besides computation, however, the system also consumes power when executing programs. A lot of memory accesses result in the entire execution, it should be considered to minimize for more efficient designs. Modern embedded systems often use heterogeneous memory to benefit from different characteristics of memory devices. This study aims to optimize the power efficiency of heterogeneous memory in embedded systems. We have proposed a detailed function complexity concept to identify specific function units in a program that consume less power in migrated memory. Using the function complexity, function selection algorithm is proposed to select a unique function which improves most after the migration. Experiments and quantitative analyses with various benchmarks have been performed to prove the validity of the proposed algorithm. Power consumption is successfully minimized by migrating certain function of a program in low-power memory

    Worst Case Delay Analysis of a DRAM Memory Request for COTS Multicore Architectures

    Get PDF
    ABSTRACT Dynamic RAM (DRAM) is a source of memory contention and interference problems on commercial of the shelf (COTS) multicore architectures. Due to its variable access time, it can greatly influence the task's WCET and can lead to unpredictability. In this paper, we provide a worst case delay analysis for a DRAM memory request to safely bound memory contention on multicore architectures. We derive a worst-case service time for a single memory request and then combine it with the per-request memory interference that can be generated by the tasks executing on same or different cores in order to generate the delay bound

    A Swap-based Cache Set Index Scheme to Leverage both Superpage and Page Coloring Optimizations

    Full text link

    SQUASH: Simple QoS-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators

    Full text link
    Modern SoCs integrate multiple CPU cores and Hardware Accelerators (HWAs) that share the same main memory system, causing interference among memory requests from different agents. The result of this interference, if not controlled well, is missed deadlines for HWAs and low CPU performance. State-of-the-art mechanisms designed for CPU-GPU systems strive to meet a target frame rate for GPUs by prioritizing the GPU close to the time when it has to complete a frame. We observe two major problems when such an approach is adapted to a heterogeneous CPU-HWA system. First, HWAs miss deadlines because they are prioritized only close to their deadlines. Second, such an approach does not consider the diverse memory access characteristics of different applications running on CPUs and HWAs, leading to low performance for latency-sensitive CPU applications and deadline misses for some HWAs, including GPUs. In this paper, we propose a Simple Quality of service Aware memory Scheduler for Heterogeneous systems (SQUASH), that overcomes these problems using three key ideas, with the goal of meeting deadlines of HWAs while providing high CPU performance. First, SQUASH prioritizes a HWA when it is not on track to meet its deadline any time during a deadline period. Second, SQUASH prioritizes HWAs over memory-intensive CPU applications based on the observation that the performance of memory-intensive applications is not sensitive to memory latency. Third, SQUASH treats short-deadline HWAs differently as they are more likely to miss their deadlines and schedules their requests based on worst-case memory access time estimates. Extensive evaluations across a wide variety of different workloads and systems show that SQUASH achieves significantly better CPU performance than the best previous scheduler while always meeting the deadlines for all HWAs, including GPUs, thereby largely improving frame rates

    The Blacklisting Memory Scheduler: Achieving high performance and fairness at low cost

    Full text link
    Abstract—In a multicore system, applications running on different cores interfere at main memory. This inter-application interference degrades overall system performance and unfairly slows down applications. Prior works have developed application-aware memory request schedulers to tackle this problem. State-of-the-art application-aware memory request schedulers prioritize memory requests of applications that are vulnerable to interfer-ence, by ranking individual applications based on their memory access characteristics and enforcing a total rank order. In this paper, we observe that state-of-the-art application-aware memory schedulers have two major shortcomings. First, ranking applications individually with a total order based on memory access characteristics leads to high hardware cost and complexity. Second, ranking can unfairly slow down applications that are at the bottom of the ranking stack. To overcome thes

    Coordinate Channel-Aware Page Mapping Policy and Memory Scheduling for Reducing Memory Interference Among Multimedia Applications

    Full text link
    "© 2017 IEEE. Personal use of this material is permitted. Permissíon from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertisíng or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works."[EN] In a modern multicore system, memory is shared among more and more concurrently running multimedia applications. Therefore, memory contention and interference are more andmore serious, inducing system performance degradation significantly, the performance degradation of each thread differently, unfairness in resource sharing, and priority inversion, even starvation. In this paper, we propose an approach of coordinating channel-aware page mapping policy and memory scheduling (CCPS) to reduce intermultimedia application interference in a memory system. The idea is to map the data of different threads to different channels, together with memory scheduling. The key principles of the policies of page mapping and memory scheduling are: 1) the memory address space, the thread priority, and the load balance; and 2) prioritizing a low-memory request thread, a row-buffer hit access, and an older request. We evaluate the CCPS on a variety of mixed single-thread and multithread benchmarks and system configurations, and we compare them with four previously proposed state-of-the-art interference-reducing policies. Experimental results demonstrate that the CCPS improves the performance while reducing the energy consumption significantly; moreover, the CCPS incurs a much lower hardware overhead than the current existing policies.This work was supported in part by the Qing Lan Project; by the National Science Foundation of China under Grant 61003077, Grant 61100193, and Grant 61401147; and by the Zhejiang Provincial Natural Science Foundation under Grant LQ14F020011.Jia, G.; Han, G.; Li, A.; Lloret, J. (2017). Coordinate Channel-Aware Page Mapping Policy and Memory Scheduling for Reducing Memory Interference Among Multimedia Applications. IEEE Systems Journal. 11(4):2839-2851. https://doi.org/10.1109/JSYST.2015.2430522S2839285111

    메모리 액세스 패턴 기반 DRAM 컨트롤러 디자인

    Get PDF
    학위논문 (석사)-- 서울대학교 대학원 공과대학 컴퓨터공학부, 2017. 8. 이창건.Mixed-criticality systems integrate tasks with various levels of criticality onto a same hardware platform. Critical tasks require tight bounding of worst-case latency at any cost, yet for non-critical tasks it is important to provide high performance as much as possible. From this, a tough design concern ariseshow to achieve the conflicting demands of performance isolation for critical tasks and efficient sharing for non-critical tasks in terms of shared DRAM bandwidth and capacity? Recently, modern mixed-criticality systems are facing rapid change in workloads. One of the biggest challenges among this is the advent of memory-intensive workloads in line with migration to multicore. Memory intensive workloads significantly exacerbate contention and interference problems in shared memory resources of multicore architectures. This not only endangers tight bounding of worst-case latency of critical tasks, but also, if not properly addressed, can lead to significant performance penalty and unfairness among non-critical tasks. In this paper, we take workload-driven approach and propose a novel workload-aware memory controller design for mixed-criticality system that can successfully achieve both of the conflicting demands in the presence of memory-intensive workloads. Based on the key observation that memory access pattern of an application captures major memory requirements of the application, our memory controller manages shared DRAM as a set of memory access pattern-aware partitions - latency sensitive, locality sensitive, and bandwidth sensitive. Our design allocates bandwidth and capacity customized to each partitions needs. By using bank partitioning and request batching with prioritizing, we guarantee short worst-case latency for critical tasks and high performance and fairness to non-critical tasks.I. Introduction 1 II. Background on DRAM Basics 4 2.1 DRAM Architecture and Characteristics 4 2.2 DRAM Memory Controller 6 2.3 Bank Partitioning 8 2.4 Memory Access Patterns 8 III. Observation 10 IV. Memory Access Pattern-Centric Memory Controller Design 16 4.1 Memory Controller Architecture 16 4.1.1 Memory access pattern-aware bank partitioning 17 4.1.2 Partition-based prioritization and request batching 17 4.2 Worst-Case Interference Delay Analysis 18 V. Evaluation 21 5.1 Experiment Setup 21 5.2 Performance result of non-critical tasks 22 VI. Related Work 24 VII. Conclusion 26 References 27Maste
    corecore