Search CORE

17 research outputs found

Encoding Mini-Graphs With Handle Prefix Outlining

Author: Bracy Anne W.
Roth Amir
Publication venue: ScholarlyCommons
Publication date: 01/01/2008
Field of study

Recently proposed techniques like mini-graphs, CCA-subgraphs, and static strands exploit application-specific compound or fused instructions to reduce execution time, energy consumption, and/or processor complexity. To achieve their full potential, these techniques rely on static tools to identify common instruction sequences that make good fusion candidates. As a result, they also rely on ISA extension facilities that can encode these chosen instruction groups in a way that supports efficient execution on fusion-enabled hardware as well as compatibility across different implementations, including fusion-agnostic implementations. This paper describes handle prefix outlining, the ISA extension scheme used by mini-graph processors. Handle prefix outlining can be thought of as a hybrid of the encoding scheme used by three previous instruction aggregation techniques: PRISC, static strands, and CCA-subgraphs. It combines the best features of each scheme to deliver both full compatibility and execution efficiency on fusion-enabled processors

ScholarlyCommons@Penn

Mini-Graph Processing

Author: Anne Weinberger Bracy
Publication venue
Publication date: 01/01/2008
Field of study

For years, single-thread performance was the most dominant force driving processor development. In recent years, however, the poor scaling of single-thread super-scalar performance and power concerns coupled with the ever-increasing number of transistors available on chip has changed the focus from single-thread performance to thread-level parallelism running on multi-core designs. The trend is for these cores to be narrower with smaller windows. This dissertation addresses the question of how to maintain—and, ideally, improve—single-thread performance under such constraints. Mini-graph processing is a form of instruction fusion—the grouping of multiple operations into a single processing unit—that increases the instruction-per-cycle (IPC) throughput of dynamically scheduled superscalar processors in an efficient way. Mini-graphs are compiler-identified aggregates of multiple instructions that look and behave like singleton instructions at every pipeline stage, except for execute—there the constituent operations are retrieved and performed serially micro-code style. A mini-graph processor exploits instruction fusion to increase the efficiency of pipeline stages and structures that perform instruction book-keeping. This dissertation describes a mini-graph architecture and evaluates it using cycle-level simulation. A superscalar processor enhanced with mini-graphs can match the performance otherwise achieved with a wider, deeper superscalar processor. Experiments show that across four benchmark suites, the addition of mini-graph processing allows a dynamically scheduled 3-wide superscalar processor to match the IPC of a 4-wide superscalar machine

CiteSeerX

ScholarlyCommons@Penn

Dataflow mini-graphs: Amplifying superscalar capacity and bandwidth

Author: Amir Roth
Anne Bracy
Prashant Prahlad
Publication venue
Publication date: 01/01/2004
Field of study

A mini-graph is a dataflow graph that has an arbitrary internal size and shape but the interface of a singleton instruction: two register inputs, one register output, a maximum of one memory operation, and a maximum of one (terminal) control transfer. Previous work has exploited dataflow sub-graphs whose execution latency can be reduced via programmable FPGA-style hardware. In this paper we show that mini-graphs can improve performance by amplifying the bandwidths of a superscalar processor’s stages and the capacities of many of its structures without custom latency-reduction hardware. Amplification is achieved because the processor deals with a complete mini-graph via a single quasi-instruction, the handle. By constraining mini-graph structure and forcing handles to behave as much like singleton instructions as possible, the number and scope of the modifications over a conventional superscalar microarchitecture is kept to a minimum. This paper describes mini-graphs, a simple algorithm for extracting them from basic block frequency profiles, and a microarchitecture for exploiting them. Cycle-level simulation of several benchmark suites shows that mini-graphs can provide average performance gains of 2–12 % over an aggressive baseline, with peak gains exceeding 40%. Alternatively, they can compensate for substantial reductions in register file and scheduler size, and in pipeline bandwidth. 1

CiteSeerX

Scipedia

ScholarlyCommons@Penn

Three Extensions to Register Integration

Author: Amir Roth
Anne Bracy
Vlad Petric
Publication venue
Publication date: 01/01/2002
Field of study

Register integration (or just integration) is a register renaming discipline that implements instruction reuse via physical register sharing. Initially developed to perform squash reuse, the integration mechanism is a powerful reuse tool that can exploit more reuse scenarios. In this paper, we describe three extensions to the initial integration mechanism that expand its applicability and boost its performance impact. First, we extend squash reuse to general reuse. Whereas squash reuse maintains the superscalar concept of an instruction instance "owning" its output physical register, we allow multiple instructions to simultaneously and seamlessly share a single physical register. Next, we replace the PC-indexing scheme used by squash reuse with an opcode-based indexing scheme that exposes more integration opportunities. Finally, we introduce an extension called reverse integration in which we speculatively create integration entries for the inverses of operations---for instance, when renaming an add, we create an entry for the inverse subtract. Reverse integration allows us to reuse operations that were not specified by the original program. We use reverse integration to obtain a free implementation of speculative memory bypassing for stack-pointer based loads (register fills and restores)

CiteSeerX

ScholarlyCommons@Penn

Disintermediated Active Communication

Author: Anne Bracy
Kshitij Doshi
Quinn Jacobson
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Information states in a multimodal dialogue system for human-robot conversation

Author: Bracy Anne
Gruenstein Alexander
Lemon Oliver
Peters Stanley
Publication venue
Publication date: 01/01/2001
Field of study

Heriot Watt Pure

The WITAS multi-modal dialogue system I

Author: Bracy Anne
Gruenstein Alexander
Lemon Oliver
Peters Stanley
Publication venue
Publication date: 01/01/2001
Field of study

Heriot Watt Pure