Search CORE

8 research outputs found

Memory disambiguation hardware: a review

Author: Castro Fernando
Chaver Daniel
Piñuel Luis
Prieto Manuel
Tirado Fernández Francisco
Publication venue
Publication date: 01/10/2008
Field of study

One of the main challenges of modern processor designs is the implementation of scalable and efficient mechanisms to detect memory access order violations as a result of out-of-order execution. Conventional structures performing this task are complex, inefficient and power-hungry. This fact has generated a large body of work on optimizing address-based memory disambiguation logic, namely the load-store queue. In this paper we review the most significant proposals in this research field, focusing on our own contributions.Facultad de Informátic

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Servicio de Difusión de la Creación Intelectual

Memory disambiguation hardware: a review

Author: Castro Fernando
Chaver Daniel
Piñuel Luis
Prieto Manuel
Tirado Fernández Francisco
Publication venue
Publication date: 13/04/2009
Field of study

Servicio de Difusión de la Creación Intelectual

高効率なメモリ順序違反検出機構に関する研究

Author: Kurata Naruki
倉田成己
Publication venue: 情報理工学系研究科電子情報学専攻
Publication date: 24/03/2015
Field of study

学位の種別: 課程博士審査委員会委員 : (主査)東京大学教授浅見徹, 東京大学教授坂井修一, 東京大学准教授田浦健次朗, 東京大学准教授豊田正史, 国立情報学研究所教授五島正裕University of Tokyo(東京大学

高効率なメモリ順序違反検出機構に関する研究

Author: Bijlsma R.J.
Bobbink Roland
Lamers Leon P.M.
Siepel H.
Verberk W.C.E.P.
Vogels J.
Weijters Maaike
Publication venue: 情報理工学系研究科電子情報学専攻
Publication date: 24/03/2015
Field of study

Wageningen University & Research Publications

System Support for Implicitly Parallel Programming

Author: Frank Matthew I.
Publication venue: Coordinated Science Laboratory, University of Illinois at Urbana-Champaign
Publication date: 01/10/2007
Field of study

Coordinated Science Laboratory was formerly known as Control Systems Laborator

Illinois Digital Environment for Access to Learning and Scholarship Repository

Address-Indexed Memory Disambiguation and Store-to-Load Forwarding

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Crossref

Design of a distributed memory unit for clustered microarchitectures

Author: Bieschewski Stefan
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

Power constraints led to the end of exponential growth in single–processor performance, which characterized the semiconductor industry for many years. Single–chip multiprocessors allowed the performance growth to continue so far. Yet, Amdahl’s law asserts that the overall performance of future single–chip multiprocessors will depend crucially on single–processor performance. In a multiprocessor a small growth in single–processor performance can justify the use of significant resources. Partitioning the layout of critical components can improve the energy–efficiency and ultimately the performance of a single processor. In a clustered microarchitecture parts of these components form clusters. Instructions are processed locally in the clusters and benefit from the smaller size and complexity of the clusters components. Because the clusters together process a single instruction stream communications between clusters are necessary and introduce an additional cost. This thesis proposes the design of a distributed memory unit and first level cache in the context of a clustered microarchitecture. While the partitioning of other parts of the microarchitecture has been well studied the distribution of the memory unit and the cache has received comparatively little attention. The first proposal consists of a set of cache bank predictors. Eight different predictor designs are compared based on cost and accuracy. The second proposal is the distributed memory unit. The load and store queues are split into smaller queues for distributed disambiguation. The mapping of memory instructions to cache banks is delayed until addresses have been calculated. We show how disambiguation can be implemented efficiently with unordered queues. A bank predictor is used to map instructions that consume memory data near the data origin. We show that this organization significantly reduces both energy usage and latency. The third proposal introduces Dispatch Throttling and Pre-Access Queues. These mechanisms avoid load/store queue overflows that are a result of the late allocation of entries. The fourth proposal introduces Memory Issue Queues, which add functionality to select instructions for execution and re-execution to the memory unit. The fifth proposal introduces Conservative Deadlock Aware Entry Allocation. This mechanism is a deadlock safe issue policy for the Memory Issue Queues. Deadlocks can result from certain queue allocations because entries are allocated out-of-order instead of in-order like in traditional architectures. The sixth proposal is the Early Release of Load Queue Entries. Architectures with weak memory ordering such as Alpha, PowerPC or ARMv7 can take advantage of this mechanism to release load queue entries before the commit stage. Together, these proposals allow significantly smaller and more energy efficient load queues without the need of energy hungry recovery mechanisms and without performance penalties. Finally, we present a detailed study that compares the proposed distributed memory unit to a centralized memory unit and confirms its advantages of reduced energy usage and of improved performance

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura