Search CORE

5 research outputs found

Best of both latency and throughput

Author: Ed Grochowski
Hong Wang
John Shen
Ronny Ronen
Publication venue
Publication date: 01/01/2004
Field of study

Abstrac

CiteSeerX

A Survey of Techniques for Architecting TLBs

Author: Mittal Sparsh
Publication venue: 'Wiley'
Publication date: 01/01/2016
Field of study

“Translation lookaside buffer” (TLB) caches virtual to physical address translation information and is used in systems ranging from embedded devices to high-end servers. Since TLB is accessed very frequently and a TLB miss is extremely costly, prudent management of TLB is important for improving performance and energy efficiency of processors. In this paper, we present a survey of techniques for architecting and managing TLBs. We characterize the techniques across several dimensions to highlight their similarities and distinctions. We believe that this paper will be useful for chip designers, computer architects and system engineers

Research Archive of Indian Institute of Technology Hyderabad

STATISTICAL MACHINE LEARNING BASED MODELING FRAMEWORK FOR DESIGN SPACE EXPLORATION AND RUN-TIME CROSS-STACK ENERGY OPTIMIZATION FOR MANY-CORE PROCESSORS

Author: NC DOCKS at The University of North Carolina at Charlotte
Zhang Changshu
Publication venue
Publication date: 01/01/2013
Field of study

The complexity of many-core processors continues to grow as a larger number of heterogeneous cores are integrated on a single chip. Such systems-on-chip contains computing structures ranging from complex out-of-order cores, simple in-order cores, digital signal processors (DSPs), graphic processing units (GPUs), application specific processors, hardware accelerators, I/O subsystems, network-on-chip interconnects, and large caches arranged in complex hierarchies. While the industry focus is on putting higher number of cores on a single chip, the key challenge is to optimally architect these many-core processors such that performance, energy and area constraints are satisfied. The traditional approach to processor design through extensive cycle accurate simulations are ill-suited for designing many-core processors due to the large microarchitecture design space that must be explored. Additionally it is hard to optimize such complex processors and the applications that run on them statically at design time such that performance and energy constraints are met under dynamically changing operating conditions. The dissertation establishes statistical machine learning based modeling framework that enables the efficient design and operation of many-core processors that meets performance, energy and area constraints. We apply the proposed framework to rapidly design the microarchitecture of a many-core processor for multimedia, computer graphics rendering, finance, and data mining applications derived from the Parsec benchmark. We further demonstrate the application of the framework in the joint run-time adaptation of both the application and microarchitecture such that energy availability constraints are met

The University of North Carolina at Greensboro

Reducing dTLB Energy Through Dynamic Resizing £

Author: A. Sivasubramaniam
M. J. Irwin
N. Vijaykrishnan
V. Delaluz
Publication venue
Publication date
Field of study

Translation Look-aside Buffer (TLB), which is small Content Addressable Memory (CAM) structure used to translate virtual addresses to physical addresses, can consume significant energy in some architectures. In addition, its power density is high, due to its small area. Consequently, reducing power consumption of TLB is important for both high-end and low-end systems. While a large TLB might be preferable from the performance angle, it can also lead to excessive dynamic energy consumption. This paper focuses on data TLB (dTLB), and proposes an architectural solution to this problem which is based on dynamically resizing the dTLB considering application execution behavior. Our objective is to give the application the minimum dTLB size (at any point) without significantly degrading its performance. We present two different implementations of this idea, and give experimental data demonstrating that it is very effective in practice. 1

CiteSeerX