Search CORE

47,557 research outputs found

Low-complexity distributed issue queue

Author: Abella Ferrer Jaume
González Colás Antonio María
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

As technology evolves, power density significantly increases and cooling systems become more complex and expensive. The issue logic is one of the processor hotspots and, at the same time, its latency is crucial for the processor performance. We present a low-complexity FP issue logic (MB/spl I.bar/distr) that achieves high performance with small energy requirements. The MB/spl I.bar/distr scheme is based on classifying instructions and dispatching them into a set of queues depending on their data dependences. These instructions are selected for issuing based on an estimation of when their operands will be available, so the conventional wakeup activity is not required. Additionally, the functional units are distributed across the different queues. The energy required by the proposed scheme is substantially lower than that required by a conventional issue design, even if the latter has the ability of waking-up only unready operands. MB/spl I.bar/distr scheme reduces the energy-delay product by 35% and the energy-delay product by 18% with respect to a state-of-the-art approach.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Multi-chip multicast schedulers in input queued switches

Author: Bianco Andrea
Scicchitano A.
Publication venue: IEEE
Publication date: 01/01/2008
Field of study

Crossref

PORTO Publications Open Repository TOrino

Distributed scheduling in input queued switches

Author: Bianco Andrea
Giaccone Paolo
Leonardi Emilio
Schiattarella E.
Scicchitano A.
Publication venue: IEEE
Publication date: 01/01/2007
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Multicast Support in Multi-Chip Centralized Schedulers in Input Queued Switches

Author: Ajmone Marsan
Ajmone Marsan
Alessandra Scicchitano
Andrea Bianco
Fraleigh
McKeown
Prabhakar
Smiljanic
Publication venue: Elsevier
Publication date: 01/01/2009
Field of study

Crossref

PORTO Publications Open Repository TOrino

Utilization-Aware Adaptive Back-Pressure Traffic Signal Control

Author: Annaswamy Anuradha
Chakraborty Samarjit
Chang Wanli
Publication venue
Publication date: 07/10/2015
Field of study

Back-pressure control of traffic signal, which computes the control phase to apply based on the real-time queue lengths, has been proposed recently. Features of it include (i) provably maximum stability, (ii) low computational complexity, (iii) no requirement of prior knowledge in traffic demand, and (iv) requirement of only local information at each intersection. The latter three points enable it to be completely distributed over intersections. However, one major issue preventing backpressure control from being used in practice is the utilization of the intersection, especially if the control phase period is fixed, as is considered in existing works. In this paper, we propose a utilization-aware adaptive algorithm of back-pressure traffic signal control, which makes the duration of the control phase adaptively dependent on the real-time queue lengths and strives for high utilization of the intersection. While advantages embedded in the back-pressure control are kept, we prove that this algorithm is work-conserving and achieves the maximum utilization. Simulation results on an isolated intersection show that the proposed adaptive algorithm has better control performance than the fixed-period back-pressure control presented in previous works

DSpace@MIT

An energy-efficient memory unit for clustered microarchitectures

Author: Bieschewski Stefan
González Colás Antonio María
Parcerisa Bundó Joan Manuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Whereas clustered microarchitectures themselves have been extensively studied, the memory units for these clustered microarchitectures have received relatively little attention. This article discusses some of the inherent challenges of clustered memory units and shows how these can be overcome. Clustered memory pipelines work well with the late allocation of load/store queue entries and physically unordered queues. Yet this approach has characteristic problems such as queue overflows and allocation patterns that lead to deadlocks. We propose techniques to solve each of these problems and show that a distributed memory unit can offer significant energy savings and speedups over a centralized unit. For instance, compared to a centralized cache with a load/store queue of 64/24 entries, our four-cluster distributed memory unit with load/store queues of 16/8 entries each consumes 31 percent less energy and performs 4,7 percent better on SPECint and consumes 36 percent less energy and performs 7 percent better for SPECfp.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Inherently workload-balanced clustered microarchitecture

Author: Abella Ferrer Jaume
González Colás Antonio María
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

The performance of clustered microarchitectures relies on steering schemes that try to find the best trade-off between workload balance and inter-cluster communication penalties. In previously proposed clustered processors, reducing communication penalties and balancing the workload are opposite targets, since improving one usually implies a detriment in the other. In this paper we propose a new clustered microarchitecture that can minimize communication penalties without compromising workload balance. The key idea is to arrange the clusters in a ring topology in such a way that results of one cluster can be forwarded to the neighbor cluster with a very short latency. In this way, minimizing communication penalties is favored when the producer of a value and its consumer are placed in adjacent clusters, which also favors workload balance. The proposed microarchitecture is shown to outperform a state-of-the-art clustered processor. For instance, for an 8-cluster configuration and just one fully pipelined unidirectional bus, 15% speedup is achieved on average for FP programs.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC