2 research outputs found

    SQUASH: Simple QoS-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators

    Full text link
    Modern SoCs integrate multiple CPU cores and Hardware Accelerators (HWAs) that share the same main memory system, causing interference among memory requests from different agents. The result of this interference, if not controlled well, is missed deadlines for HWAs and low CPU performance. State-of-the-art mechanisms designed for CPU-GPU systems strive to meet a target frame rate for GPUs by prioritizing the GPU close to the time when it has to complete a frame. We observe two major problems when such an approach is adapted to a heterogeneous CPU-HWA system. First, HWAs miss deadlines because they are prioritized only close to their deadlines. Second, such an approach does not consider the diverse memory access characteristics of different applications running on CPUs and HWAs, leading to low performance for latency-sensitive CPU applications and deadline misses for some HWAs, including GPUs. In this paper, we propose a Simple Quality of service Aware memory Scheduler for Heterogeneous systems (SQUASH), that overcomes these problems using three key ideas, with the goal of meeting deadlines of HWAs while providing high CPU performance. First, SQUASH prioritizes a HWA when it is not on track to meet its deadline any time during a deadline period. Second, SQUASH prioritizes HWAs over memory-intensive CPU applications based on the observation that the performance of memory-intensive applications is not sensitive to memory latency. Third, SQUASH treats short-deadline HWAs differently as they are more likely to miss their deadlines and schedules their requests based on worst-case memory access time estimates. Extensive evaluations across a wide variety of different workloads and systems show that SQUASH achieves significantly better CPU performance than the best previous scheduler while always meeting the deadlines for all HWAs, including GPUs, thereby largely improving frame rates

    AMBA bus hardware accelerator IP for viola-jones face detection

    No full text
    Face detection is an important aspect for biometrics, video surveillance and human computer interaction. Owing to the complexity of the detection algorithms any biometric system requires a huge amount of computational and memory resources. A direct software-like implementation of any detection algorithm on a low speed, low resource, low power system on chip (SoC) is not feasible. Instead, a software-hardware codesign approach can be used to build hardware accelerators for the most computational consuming parts of the detection algorithms. Therefore the authors propose a compliant advanced microcontroller bus architecture (AMBA) bus hardware IP, a modularised, highly configurable, low power and technology independent core written in an hardware description language (HDL) language. The IP core accelerates Viola-Jones algorithm considered to be one of the most used algorithms for face detection. The hardware accelerator IP is used in an embedded face detection system built around the LEON3 Sparc V8 processor. The authors present the methodology, challenges and performance results for software, hardware and system level design. For the mentioned system the authors have obtained an acceleration factor of 10-12 when using the hardware accelerator in comparison with the software only traditional approach. © The Institution of Engineering and Technology 2013.Peer Reviewe
    corecore