369 research outputs found

    Analytical Evaluation of Energy and Throughput for Multilevel Caches

    Get PDF
    With the increase of processor-memory performance gap, it has become important to gauge the performance of cache architectures so as to evaluate their impact on energy requirement and throughput of the system. Multilevel caches are found to be increasingly prevalent in the high-end processors. Additionally, the recent drive towards multicore systems has necessitated the use of multilevel cache hierarchies for shared memory architectures. This paper presents simplified and accurate mathematical models to estimate the energy consumption and the impact on throughput for multilevel caches for single core systems

    Data Cache-Energy and Throughput Models: Design Exploration for Embedded Processors

    Get PDF
    Most modern 16-bit and 32-bit embedded processors contain cache memories to further increase instruction throughput of the device. Embedded processors that contain cache memories open an opportunity for the low-power research community to model the impact of cache energy consumption and throughput gains. For optimal cache memory configuration mathematical models have been proposed in the past. Most of these models are complex enough to be adapted for modern applications like run-time cache reconfiguration. This paper improves and validates previously proposed energy and throughput models for a data cache, which could be used for overhead analysis for various cache types with relatively small amount of inputs. These models analyze the energy and throughput of a data cache on an application basis, thus providing the hardware and software designer with the feedback vital to tune the cache or application for a given energy budget. The models are suitable for use at design time in the cache optimization process for embedded processors considering time and energy overhead or could be employed at runtime for reconfigurable architectures

    A fuzzy logic based dynamic reconfiguration scheme for optimal energy and throughput in symmetric chip multiprocessors

    Get PDF
    Embedded systems architectures have traditionally often been investigated and designed in order to achieve a greater throughput combined with minimum energy consumption. With the advent of reconfigurable architectures it is now possible to support algorithms to find optimal solutions for an improved energy and throughput balance. As a result of ongoing research several online and offline techniques and algorithm have been proposed for hardware adaptation. This paper presents a novel coarse-grained reconfigurable symmetric chip multiprocessor (SCMP) architecture managed by a fuzzy logic engine that balances performance and energy consumption. The architecture incorporates reconfigurable level 1 (L1) caches, power gated cores and adaptive on-chip network routers to allow minimizing leakage energy effects for inactive components. A coarse grained architecture was selected as to be a focus for this study as it typically allows for fast reconfiguration as compared to the fine-grained architectures, thus making it more feasible to be used for runtime adaption schemes. The presented architecture is analyzed using a set of OpenMP based parallel benchmarks and the results show significant improvements in performance while maintaining minimum energy consumption

    Fuzzy logic based energy and throughput aware design space exploration for MPSoCs

    Get PDF
    Multicore architectures were introduced to mitigate the issue of increase in power dissipation with clock frequency. Introduction of deeper pipelines, speculative threading etc. for single core systems were not able to bring much increase in performance as compared to their associated power overhead. However for multicore architectures performance scaling with number of cores has always been a challenge. The Amdahl's law shows that the theoretical maximum speedup of a multicore architecture is not even close to the multiple of number of cores. With less amount of code in parallel having more number of cores for an application might just contribute in greater power dissipation instead of bringing some performance advantage. Therefore there is a need of an adaptive multicore architecture that can be tailored for the application in use for higher energy efficiency. In this paper a fuzzy logic based design space exploration technique is presented that is targeted to optimize a multicore architecture according to the workload requirements in order to achieve optimum balance between throughput and energy of the system

    Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0

    Get PDF
    Journal ArticleA significant part of future microprocessor real estate will be dedicated to L2 or L3 caches. These on-chip caches will heavily impact processor performance, power dissipation, and thermal management strategies. There are a number of interconnect design considerations that influence power/performance/area characteristics of large caches, such as wire models (width/spacing/repeaters), signaling strategy (RC/differential/transmission), router design, etc. Yet, to date, there exists no analytical tool that takes all of these parameters into account to carry out a design space exploration for large caches and estimate an optimal organization. In this work, we implement two major extensions to the CACTI cache modeling tool that focus on interconnect design for a large cache. First, we add the ability to model different types of wires, such as RC-based wires with different power/delay characteristics and differential low-swing buses. Second, we add the ability to model Non-uniform Cache Access (NUCA). We not only adopt state-of-the-art design space exploration strategies for NUCA, we also enhance this exploration by considering on-chip network contention and a wider spectrum of wiring and routing choices. We present a validation analysis of the new tool (to be released as CACTI 6.0) and present a case study to showcase how the tool can improve architecture research methodologies

    Optimizing NUCA organizations and wiring alternatives for large caches with CACTI 6.0

    Get PDF
    Journal ArticleA significant part of future microprocessor real estate will be dedicated to L2 or L3 caches. These on-chip caches will heavily impact processor performance, power dissipation, and thermal management strategies. There are a number of interconnect design considerations that influence power/performance/area characteristics of large caches, such as wire models (width/spacing/repeaters), signaling strategy (RC/differential/transmission), router design, etc. Yet, to date, there exists no analytical tool that takes all of these parameters into account to carry out a design space exploration for large caches and estimate an optimal organization. In this work, we implement two major extensions to the CACTI cache modeling tool that focus on interconnect design for a large cache. First, we add the ability to model different types of wires, such as RC-based wires with different power/delay characteristics and differential low-swing buses. Second, we add the ability to model Non-uniform Cache Access (NUCA). We not only adopt state-of-the-art design space exploration strategies for NUCA, we also enhance this exploration by considering on-chip network contention and a wider spectrum of wiring and routing choices. We present a validation analysis of the new tool (to be released as CACTI 6.0) and present a case study to showcase how the tool can improve architecture research methodologies

    Computational Physics on Graphics Processing Units

    Full text link
    The use of graphics processing units for scientific computations is an emerging strategy that can significantly speed up various different algorithms. In this review, we discuss advances made in the field of computational physics, focusing on classical molecular dynamics, and on quantum simulations for electronic structure calculations using the density functional theory, wave function techniques, and quantum field theory.Comment: Proceedings of the 11th International Conference, PARA 2012, Helsinki, Finland, June 10-13, 201

    Multilevel simulation-based co-design of next generation HPC microprocessors

    Get PDF
    This paper demonstrates the combined use of three simulation tools in support of a co-design methodology for an HPC-focused System-on-a-Chip (SoC) design. The simulation tools make different trade-offs between simulation speed, accuracy and model abstraction level, and are shown to be complementary. We apply the MUSA trace-based simulator for the initial sizing of vector register length, system-level cache (SLC) size and memory bandwidth. It has proven to be very efficient at pruning the design space, as its models enable sufficient accuracy without having to resort to highly detailed simulations. Then we apply gem5, a cycle-accurate micro-architecture simulator, for a more refined analysis of the performance potential of our reference SoC architecture, with models able to capture detailed hardware behavior at the cost of simulation speed. Furthermore, we study the network-on-chip (NoC) topology and IP placements using both gem5 for representative small- to medium-scale configurations and SESAM/VPSim, a transaction-level emulator for larger scale systems with good simulation speed and sufficient architectural details. Overall, we consider several system design concerns, such as processor subsystem sizing and NoC settings. We apply the selected simulation tools, focusing on different levels of abstraction, to study several configurations with various design concerns and evaluate them to guide architectural design and optimization decisions. Performance analysis is carried out with a number of representative benchmarks. The obtained numerical results provide guidance and hints to designers regarding SIMD instruction width, SLC sizing, memory bandwidth as well as the best placement of memory controllers and NoC form factor. Thus, we provide critical insights for efficient design of future HPC microprocessors.This work has been performed in the context of the European Processor Initiative (EPI) project, which has received funding from the European Union’s Horizon 2020 research and innovation program under Grant Agreement № 826647. A special thanks to Amir Charif and Arief Wicaksana for their invaluable contributions to the SESAM/VPSim tool in the initial phases of the EPI project.Peer ReviewedPostprint (author's final draft
    • …
    corecore