Search CORE

36 research outputs found

Light NUCA: a proposal for bridging the inter-cache latency gap

Author: Beivide Palacio Julio Ramón
Monreal Arnal Teresa
Suárez Dario
Vallejo Fernando
Viñals Yufera Víctor
Publication venue: IEEE Computer Society
Publication date: 01/01/2009
Field of study

To deal with the “memory wall” problem, microprocessors include large secondary on-chip caches. But as these caches enlarge, they originate a new latency gap between them and fast L1 caches (inter-cache latency gap). Recently, Non-Uniform Cache Architectures (NUCAs) have been proposed to sustain the size growth trend of secondary caches that is threatened by wire-delay problems. NUCAs are size-oriented, and they were not conceived to close the inter-cache latency gap. To tackle this problem, we propose Light NUCAs (L-NUCAs) leveraging on-chip wire density to interconnect small tiles through specialized networks, which convey packets with distributed and dynamic routing. Our design reduces the tile delay (cache access plus one-hop routing) to a single processor cycle and places cache lines at a finer-granularity than conventional caches reducing cache latency. Our evaluations show that in general, L-NUCA improves simultaneously performance, energy, and area when integrated into both conventional or D-NUCA hierarchies.Postprint (author’s final draft

UPCommons. Portal del coneixement obert de la UPC

Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

Author: Gujarathi Hemal S
McDonald-Maier Klaus D
Qadri Muhammad Yasir
Publication venue: 'Academy Publisher'
Publication date: 01/01/2009
Field of study

The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

University of Essex Research Repository

CiteSeerX

Crossref

CCNoC: On-Chip Interconnects for Cache-Coherent Manycore Server Chips

Author: De Micheli Giovanni
Falsafi Babak
Khosro Pour Naser
Seiculescu Ciprian
Volos Stavros
Publication venue
Publication date: 22/07/2011
Field of study

Manycore chips are emerging as the architecture of choice to provide power-scalability and improve performance while riding the Moore’s law. On-chip interconnects are increasingly playing a pivotal role in power- and performance- scalability of such microarchitectures. As supply voltages begin to level off in future technologies, chip designs in general and interconnects in particular are resorting to specialization to provide power- and performance-scalability. In this paper, we make the observation that cache-coherent manycore chips exhibit a duality in on-chip network traffic. Request traffic typically consists of control packets requiring narrow low-power switches, while response traffic often carries cache block-sized payloads that require wider and higher-power switches. We present Cache-Coherence Network-on-Chip (CCNoC), a design to capitalize on this duality in traffic and provide a pair of asymmetric switches that optimize power and performance over conventional onchip interconnects. Cycle-accurate simulation results for a 4x4 chip multiprocessor with a shared last-level cache running commercial server workloads indicate 22% improvement in power over a torus and 38% improvement in power over a mesh with larger channel width, while providing similar performance

Infoscience - École polytechnique fédérale de Lausanne

Reliability aware NoC router architecture using input channel buffer sharing

Author: M. H. Neishaburi
Zeljko Zilic
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

To address the increasing demand for reliability in on-chip networks, we proposed a novel Reliability Aware Virtual channel (RAVC) NoC router micro-architecture that enables both dynamic virtual channel allocations and the rational sharing among the buffers of different input channels. In particular, in the case of failure in routers, the virtual channels of routers surrounding the faulty routers can be totally recaptured and reassigned to other input ports. Moreover, our proposed RAVC router isolates the faulty router from occupying network bandwidth. Experimental result shows that proposed micro-architecture provides 7.1 % and 3.1 % average latency decrease under uniform and transpose traffic pattern. Considering the existence of failures in routers of on-chip network, RAVC provides 28 % and 16 % decrease in the average packet latency under the uniform and transpose traffic pattern respectively

CiteSeerX

Crossref

Throttling Control for Bufferless Routing in On-Chip Networks

Author: Ahmadou Dit ADI Cisse
Guan Yicheng
Irie Hidetsugu
Koibuchi Michihiro
Miyoshi Takefumi
Yoshinaga Tsutomu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/09/2012
Field of study

As the number of core integration on a single die grows, buffers consume significant energy, and occupy chip area. A bufferless deflection outing that eliminates router’s input port buffers can considerably help saving energy and chip area while providing similar performance of xisting buffered routing, especially for low-to-medium network loads. However when congestion increases, the bufferless frequently causes flits deflections, and misrouting leading to a degradation of network performance. In this paper, we propose IRT(Injection Rate Throttling), a ocal throttling mechanism that reduces deflection and misrouting for high-load bufferless networks. IRT provides injection rate control independently for each network node, allowing to reduce network congestion. Our simulation results based on a cycle-accurate simulator show that using IRT, IRT reduces average transmission latency by 8.65% compared to traditional bufferless routing

Crossref

Creative Repository of Electro-Communications

Microarchitectural wire management for performance and power in partitioned architectures

Author: Balasubramonian Rajeev
Muralimanohar Naveen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Journal ArticleFuture high-performance billion-transistor processors are likely to employ partitioned architectures to achieve high clock speeds, high parallelism, low design complexity, and low power. In such architectures, inter-partition communication over global wires has a significant impact on overall processor performance and power consumption. VLSI techniques allow a variety of wire implementations, but these wire properties have previously never been exposed to the microarchitecture. This paper advocates global wire management at the microarchitecture level and proposes a heterogeneous interconnect that is comprised of wires with varying latency, bandwidth, and energy characteristics. We propose and evaluate microarchitectural techniques that can exploit such a heterogeneous interconnect to improve performance and reduce energy consumption. These techniques include a novel cache pipeline design, the identification of narrow bit-width operands, the classification of non-critical data, and the detection of interconnect load imbalance. For a dynamically scheduled partitioned architecture, our results demonstrate that the proposed innovations result in up to 11% reductions in overall processor ED2, compared to a baseline processor that employs a homogeneous interconnect

The University of Utah: J. Willard Marriott Digital Library

Power and Area Efficient Design of Network-on-Chip Router through Utilization

Author: Hannu Tenhunen
Khalid Latif
Tiberiu Seceleanu
Publication venue
Publication date: 01/01/2010
Field of study

Abstract Network-on-Chip (NoC) is the interconnection platform that answers the requirements of the modern on-Chip design. Small optimizations in NoC router architecture can show a significant improvement in the overall performance of NoC based systems. Power consumption, area overhead and the entire NoC performance is influenced by the router buffers. Resource sharing for on-chip network is critical to reduce the chip area and power consumption. Virtual channel buffer sharing by other router ports has been proposed to enhance the performance of on-chip communication. We approach the router architecture optimization by utilizing the idle buffers instead of increasing the number and size of buffers for desired throughput

CiteSeerX

Analysis of Error Recovery Schemes for Networks on Chips

Author: G. De Micheli
L. Benini
M.J. Irwin
N. Vijaykrishnan
S. Murali
T. Theocharides
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref