Search CORE

107 research outputs found

Selective value prediction

Author: Brad Calder
Calder B.
Dean M. Tullsen
Glenn Reinman
Jacobsen E.
Kessler R.E.
Rychlik B.
Tullsen D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Harnessing ISA diversity

Author: Tullsen Dean M
Venkat Ashish
Publication venue: eScholarship, University of California
Publication date: 14/06/2014
Field of study

Heterogeneous multicore architectures have the potential for high performance and energy efficiency. These architectures may be composed of small power-efficient cores, large high-performance cores, and/or specialized cores that accelerate the performance of a particular class of computation. Architects have explored multiple dimensions of heterogeneity, both in terms of micro-architecture and specialization. While early work constrained the cores to share a single ISA, this work shows that allowing heterogeneous ISAs further extends the effectiveness of such architectures This work exploits the diversity offered by three modern ISAs: Thumb, x86-64, and Alpha. This architecture has the potential to outperform the best single-ISA heterogeneous architecture by as much as 21%, with 23% energy savings and a reduction of 32% in Energy Delay Product

Crossref

eScholarship - University of California

Core architecture optimization for heterogeneous chip multiprocessors

Author: Dean M. Tullsen
Rakesh Kumar
Publication venue: ACM Press
Publication date: 01/01/2006
Field of study

Previous studies have demonstrated the advantages of single-ISA heterogeneous multi-core architectures for power and performance. However, none of those studies examined how to design such a processor; instead, they started with an assumed combination of pre-existing cores. This work assumes the flexibility to design a multi-core architecture from the ground up and seeks to address the following question: what should be the characteristics of the cores for a heterogeneous multi-processor for the highest area or power efficiency? The study is done for varying degrees of thread-level parallelism and for different area and power budgets. The most efficient chip multiprocessors are shown to be heterogeneous, with each core customized to a different subset of application characteristics – no single core is necessarily well suited to all applications. The performance ordering of cores on such processors is different for different applications; there is only a partial ordering among cores in terms of resources and complexity. This methodology produces performance gains as high as 40%. The performance improvements come with the added cost of customization. 1

CiteSeerX

ABSTRACT Core Architecture Optimization for Heterogeneous Chip Multiprocessors

Author: Dean M. Tullsen
Rakesh Kumar
Publication venue
Publication date
Field of study

CiteSeerX

Handling Long-latency Loads in a Simultaneous Multithreading Processor

Author: Dean M. Tullsen
Jeffery A. Brown
Publication venue
Publication date: 01/01/2001
Field of study

Simultaneous multithreading architectures have been defined previously with fully shared execution resources. When one thread in such an architecture experiences a very longlatency operation, such as a load miss, the thread will eventually stall, potentially holding resources which other threads could be using to make forward progress. This paper shows that in many cases it is better to free the resources associated with a stalled thread rather than keep that thread ready to immediately begin execution upon return of the loaded data. Several possible architectures are examined, and some simple solutions are shown to be very effective, achieving speedups close to 6.0 in some cases, and averaging 15% speedup with four threads and over 100% speedup with two threads running. Response times are cut in half for several workloads in open system experiments.

CiteSeerX

Compiling for Instruction Cache Performance on a Multithreaded Architecture

Author: Dean M. Tullsen
Rakesh Kumar
Publication venue
Publication date
Field of study

Instruction cache aware compilation seeks to lay out a program in memory in such a way that cache conflicts between procedures are minimized. It does this through profile-driven knowledge of procedure invocation patterns. On a multithreaded architecture, however, more conflicts may arise between threads than between procedures on the same thread. This research examines opportunities for the compiler to optimize instruction cache layout on a multithreaded architecture. We examine scenarios where (1) the compiler has knowledge about multiple programs that will be or are likely to be co-scheduled, and where (2) the compiler has no knowledge at compile time of which applications will be co-scheduled. We present solutions for both environments

CiteSeerX