Search CORE

2,479 research outputs found

POOR MAN’S TRACE CACHE: A VARIABLE DELAY SLOT ARCHITECTURE

Author: Moore Tino C.
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2022
Field of study

We introduce a novel fetch architecture called Poor Man’s Trace Cache (PMTC). PMTC constructs taken-path instruction traces via instruction replication in static code and inserts them after unconditional direct and select conditional direct control transfer instructions. These traces extend to the end of the cache line. Since available space for trace insertion may vary by the position of the control transfer instruction within the line, we refer to these fetch slots as variable delay slots. This approach ensures traces are fetched along with the control transfer instruction that initiated the trace. Branch, jump and return instruction semantics as well as the fetch unit are modified to utilize traces in delay slots. PMTC yields the following benefits: 1. Average fetch bandwidth increases as the front end can fetch across taken control transfer instructions in a single cycle. 2. The dynamic number of instruction cache lines fetched by the processor is reduced as multiple non contiguous basic blocks along a given path are encountered in one fetch cycle. 3. Replication of a branch instruction along multiple paths provides path separability for branches, which positively impacts branch prediction accuracy. PMTC mechanism requires minimal modifications to the processor’s fetch unit and the trace insertion algorithm can easily be implemented within the assembler without compiler support

Michigan Technological University

Recommended from our members

Combined branch target and predicate prediction

Author: Douglas Burger
Stephen W. Keckler
Publication venue: United States Patent and Trademark Office
Publication date: 25/03/2015
Field of study

Embodiments provide methods, apparatus, systems, and computer readable media associated with predicting predicates and branch targets during execution of programs using combined branch target and predicate predictions. The predictions may be made using one or more prediction control flow graphs which represent predicates in instruction blocks and branches between blocks in a program. The prediction control flow graphs may be structured as trees such that each node in the graphs is associated with a predicate instruction, and each leaf associated with a branch target which jumps to another block. During execution of a block, a prediction generator may take a control point history and generate a prediction. Following the path suggested by the prediction through the tree, both predicate values and branch targets may be predicted. Other embodiments may be described and claimed.Board of Regents, University of Texas Syste

Texas ScholarWorks

Design of testbed and emulation tools

Author: Flynn M. J.
Lundstrom S. F.
Publication venue
Publication date
Field of study

The research summarized was concerned with the design of testbed and emulation tools suitable to assist in projecting, with reasonable accuracy, the expected performance of highly concurrent computing systems on large, complete applications. Such testbed and emulation tools are intended for the eventual use of those exploring new concurrent system architectures and organizations, either as users or as designers of such systems. While a range of alternatives was considered, a software based set of hierarchical tools was chosen to provide maximum flexibility, to ease in moving to new computers as technology improves and to take advantage of the inherent reliability and availability of commercially available computing systems

NASA Technical Reports Server

Context flow architecture

Author: Lees Timothy
Publication venue: The University of Edinburgh
Publication date: 01/01/1990
Field of study

Edinburgh Research Archive

An evaluation of the TRIPS computer system

Author: Behnam R
Burger D
Burril J
Diammond J
Gebhart M
Grattz P
Keckler S
Koons C
Maher B
Marino MD
McKinley K
Ranganathan N
Smith A
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

The TRIPS system employs a new instruction set architecture (ISA) called Explicit Data Graph Execution (EDGE) that renegotiates the boundary between hardware and software to expose and exploit concurrency. EDGE ISAs use a block-atomic execution model in which blocks are composed of dataflow instructions. The goal of the TRIPS design is to mine concurrency for high performance while tolerating emerging technology scaling challenges, such as increasing wire delays and power consumption. This paper evaluates how well TRIPS meets this goal through a detailed ISA and performance analysis. We compare performance, using cycles counts, to commercial processors. On SPEC CPU2000, the Intel Core 2 outperforms compiled TRIPS code in most cases, although TRIPS matches a Pentium 4. On simple benchmarks, compiled TRIPS code outperforms the Core 2 by 10% and hand-optimized TRIPS code outperforms it by factor of 3. Compared to conventional ISAs, the block-atomic model provides a larger instruction window, increases concurrency at a cost of more instructions executed, and replaces register and memory accesses with more efficient direct instruction-to-instruction communication. Our analysis suggests ISA, microarchitecture, and compiler enhancements for addressing weaknesses in TRIPS and indicates that EDGE architectures have the potential to exploit greater concurrency in future technologies

CiteSeerX

Crossref

Leeds Beckett Repository

Design and implementation of an efficient spatial locality predictor for GPUs

Author: Agarwal N.
Publication venue
Publication date: 29/11/2018
Field of study

Pure OAI Repository

Parallel Processor Architecture with a New Algorithm for Simultaneous Processing of MIPS-Based Series Instructions

Author: Hadizadeh Ali
Tanghatari Ehsan
Publication venue: 'Ital Publication'
Publication date: 01/12/2017
Field of study

Processors are main part of the calculation and decision making of a system. Today, due to the increasing need of industry and technology to faster and more accurate computing power, design and manufacture of parallel processing units, has been very much considered. One of the most important processor families used in various devises is the MIPS processors. This processor family had been considered in the telecom and control industry as a reasonable choice. In this paper, new architecture based on this processor, with new parallel processing design, is provided to allow parallel execution of instructions dynamically. Ultimately, the processor efficiency to several fold will be increased. In this architecture, new ideas for the issuance of instructions in parallel, intelligent detection of conditional jumps and memory management are presented

Emerging Science Journal (ESJ)

Directory of Open Access Journals

Grounding Language to Autonomously-Acquired Skills via Goal Generation

Author: Akakzia Ahmed
Chetouani Mohamed
Colas Cédric
Oudeyer Pierre-Yves
Sigaud Olivier
Publication venue
Publication date: 25/01/2021
Field of study

We are interested in the autonomous acquisition of repertoires of skills. Language-conditioned reinforcement learning (LC-RL) approaches are great tools in this quest, as they allow to express abstract goals as sets of constraints on the states. However, most LC-RL agents are not autonomous and cannot learn without external instructions and feedback. Besides, their direct language condition cannot account for the goal-directed behavior of pre-verbal infants and strongly limits the expression of behavioral diversity for a given language input. To resolve these issues, we propose a new conceptual approach to language-conditioned RL: the Language-Goal-Behavior architecture (LGB). LGB decouples skill learning and language grounding via an intermediate semantic representation of the world. To showcase the properties of LGB, we present a specific implementation called DECSTR. DECSTR is an intrinsically motivated learning agent endowed with an innate semantic representation describing spatial relations between physical objects. In a first stage (G -> B), it freely explores its environment and targets self-generated semantic configurations. In a second stage (L -> G), it trains a language-conditioned goal generator to generate semantic goals that match the constraints expressed in language-based inputs. We showcase the additional properties of LGB w.r.t. both an end-to-end LC-RL approach and a similar approach leveraging non-semantic, continuous intermediate representations. Intermediate semantic representations help satisfy language commands in a diversity of ways, enable strategy switching after a failure and facilitate language grounding.Comment: Published at ICLR 202

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server