Search CORE

47 research outputs found

Hardware schemes for early register release

Author: González Colás Antonio María
Monreal Arnal Teresa
Valero Cortés Mateo
Viñals Yufera Víctor
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

Register files are becoming one of the critical components of current out-of-order processors in terms of delay and power consumption, since their potential to exploit instruction-level parallelism is quite related to the size and number of ports of the register file. In conventional register renaming schemes, register releasing is conservatively done only after the instruction that redefines the same register is committed. Instead, we propose a scheme that releases registers as soon as the processor knows that there will be no further use of them. We present two early releasing hardware implementations with different performance/complexity trade-offs. Detailed cycle-level simulations show either a significant speedup for a given register file size, or a reduction in register file size for a given performance level.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Late allocation and early release of physical registers

Author: González Colás Antonio María
González González José
Monreal Arnal Teresa
Valero Cortés Mateo
Viñals Yufera Víctor
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

The register file is one of the critical components of current processors in terms of access time and power consumption. Among other things, the potential to exploit instruction-level parallelism is closely related to the size and number of ports of the register file. In conventional register renaming schemes, both register allocation and releasing are conservatively done, the former at the rename stage, before registers are loaded with values, and the latter at the commit stage of the instruction redefining the same register, once registers are not used any more. We introduce VP-LAER, a renaming scheme that allocates registers later and releases them earlier than conventional schemes. Specifically, physical registers are allocated at the end of the execution stage and released as soon as the processor realizes that there will be no further use of them. VP-LAER enhances register utilization, that is, the fraction of allocated registers having a value to be read in the future. Detailed cycle-level simulations show either a significant speedup for a given register file size or a reduction in the register file size for a given performance level, especially for floating-point codes, where the register file pressure is usually high.Peer ReviewedPostprint (published version

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

DESIGN SPACE EXPLORATION AND OPTIMIZATION OF SUPER SCALAR PROCESSOR

Author: MURTHY PROF.G. KRISHNA
NAVATHA K.
Publication venue: Institute for Project Management Pvt. Ltd
Publication date: 21/08/2020
Field of study

Designing a microprocessor involves determining the optimal microarchitecture for a given objective function and a given set of constraints. Superscalar processing is the latest in along series of innovations aimed at producing ever-faster microprocessors. By exploiting instruction-level parallelism, superscalar processors[1] are capable of executing more than one instruction in a clock cycle.The architectural design of super scalar processor involves a lot of trade off issues when selecting parameter values for instruction level parallelism.The use of critical quantitative analysis based upon the Simple Scalar simulations is necessary to select optimal parameter values for the processor aimed at specific target environment. This paper aims at finding optimal values for the super scalar processor and determines which processor parameters have the greatest impact on the simulated execution time

Interscience Research Network

Dynamic Instruction Scheduling For Microprocessors Having Out Of Order Execution

Author: Gupta Vishal
Kumar Suresh
Tamta Vivek Kumar
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 31/03/2012
Field of study

Dynamic Instruction Scheduling is very much needed for fast working of multiprocessor and reduction of overhead by the processor. The Instruction scheduling logic mainly depends on associative searching of the entries to the dynamic wakeup instructions for the execution. We also describes the scheduler concept which also the concern for scalability and complexity of the multiprocessor. We have different Dynamic Instruction Scheduling Logic highlighting the objectives, goals, advantages and challenges facing during scheduling logic like energy issues and complexity issues as well as full description of dynamic instruction Scheduling logic. In this paper, we will be presented in a comprehensive analysis to reschedule the execution order of instructions for improve the performance of microprocessor. General Terms: Design, Performance, Measurement Keywords: Dynamic Instruction Scheduling, Instruction Grouping, Issue Queue

International Institute for Science, Technology and Education (IISTE): E-Journals

Dynamic instruction scheduling and data forwarding in asynchronous superscalar processors

Author: Mullins Robert D.
Publication venue: The University of Edinburgh
Publication date: 01/01/2001
Field of study

Edinburgh Research Archive

Recommended from our members

Tolerating processor-memory performance gap

Author: Lai Shih-Chang
Publication venue: 'Oregon State University'
Publication date
Field of study

While the performance gap between microprocessors and main memory is ever increasing each year, cache memory has been a bridge to alleviate this discrepancy. In this thesis proposal, we introduce three techniques to tolerate this processor and memory speed imbalance. First, we propose the bloom filter scheme to identify which load operant could cause cache miss. Second, we explore a new fault-tolerant microarchitecture to detect transient error occurs. Third, we proposed a novel hardware-only mechanism to solve pointer-chasing problem in Link-list Data Structure application. The simulation shows that the bloom filter may filter out 99% of cache miss. The new fault-tolerant microarchitecture reduce the penalty caused by detecting instruction error about 1.8-13%. The hardware-only data prefetch mechanism accurate predict over 80% of irregular address pattern and improve the performance by 7%

ScholarsArchive@OSU

StressTest: an automatic approach to test generation via activity monitors

Author: I. Wagner
T. Austin
V. Bertacco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

Crossref

A Comparison of x86 Computer Architecture Simulators

Author: Akram Ayaz
Sawalha Lina
Publication venue: ScholarWorks at WMU
Publication date: 01/10/2016
Field of study

The signiﬁcance of computer architecture simulators in advancing computer architecture research is widely acknowledged. Computer architects have developed numerous simulators in the past few decades and their number continues to rise. This paper explores different simulation techniques and surveys many simulators. Comparing simulators with each other and validating their correctness has been a challenging task. In this paper, we compare and contrast x86 simulators in terms of ﬂexibility, level of details, user friendliness and simulation models. In addition, we measure the experimental error and compare the speed of four contemporary x86 simulators: gem5, Sniper, Multi2sim and PTLsim. We also discuss the strengths and limitations of the different simulators. We believe that this paper provides insights into different simulation strategies and aims to help computer architects understand the differences among the existing simulation tools

ScholarWorks at WMU

Power considerations for memory-related microarchitecture designs

Author: Zhu Zhichun
Publication venue: W&M ScholarWorks
Publication date: 01/01/2003
Field of study

The fast performance improvement of computer systems in the last decade comes with the consistent increase on power consumption. In recent years, power dissipation is becoming a design constraint even for high-performance systems. Higher power dissipation means higher packaging and cooling cost, and lower reliability. This Ph.D. dissertation will investigate several memory-related design and optimization issues of general-purpose computer microarchitectures, aiming at reducing the power consumption without sacrificing the performance. The memory system consumes a large percentage of the system\u27s power. In addition, its behavior affects the processor power consumption significantly. In this dissertation, we propose two schemes to address the power-aware architecture issues related to memory: (1) We develop and evaluate low-power techniques for high-associativity caches. By dynamically applying different access modes for cache hits and misses, our proposed cache structure can achieve nearly lowest power consumption with minimal performance penalty. (2) We propose and evaluate look-ahead architectural adaptation techniques to reduce power consumption in processor pipelines based on the memory access information. The scheme can significantly reduce the power consumption of memory-intensive applications. Combined with other adaptation techniques, our schemes can effectively reduce the power consumption for both computer- and memory-intensive applications. The significance, potential impacts, and contributions of this dissertation are: (1) Academia and industry R & D has solely targeted the objective of high performance in both hardware and software designs since the beginning stage of building computer systems. However, the pursuit of high performance without considering energy consumption will inevitably lead to increased power dissipation and thus will eventually limit the development and progress of increasingly demanded mobile, portable, and high-performance computing systems. (2) Since our proposed method adaptively combines the merits of existing low-power cache designs, it approaches the optimum in terms of both retaining performance and saving energy. This low power solution for highly associative caches can be easily deployed with a low cost. (3) Using a cache miss , a common program execution event, as a triggering signal to slow down the processor issue rate, our scheme can effectively reduce processor power consumption. This design can be easily and practically deployed in many processor architectures with a low cost

College of William & Mary: W&M Publish