Search CORE

28 research outputs found

Aggressive Memory Speculation in HW/SW Co-Designed Machines

Author: Derrien Steven
Rohou Erven
Rokicki Simon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/03/2019
Field of study

International audienceSingle-ISA heterogeneous systems (such as ARM big.LITTLE) are an attractive solution for embedded platforms as they expose performance/energy trade-offs directly to the operating system. Recent works have demonstrated the ability to increase their efficiency by using VLIW cores, supported through Dynamic Binary Translation (DBT) to maintain the illusion of a single-ISA system. However, VLIW cores cannot rival with Outof- Order (OoO) cores when it comes to performance, mainly because they do not use speculative execution. In this work, we study how it is possible to use memory dependency speculation during the DBT process. Our approach enables fine-grained speculation optimizations thanks to a combination of hardware and software. Our results show that our approach leads to a geo-mean speed-up of 10% at the price of a 7% area overhead

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Recursos anchos: una técnica de bajo coste para explotar paralelismo agresivo en códigos numéricos

Author: López Álvarez David
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/1998
Field of study

Els bucles son la part que més temps consumeix en les aplicacions numèriques. El rendiment dels bucles està limitat tant pels recursos oferts per l'arquitectura com per les recurrències del bucle en la computació. Per executar més operacions per cicle, els processadors actuals es dissenyen amb graus creixents de replicació de recursos (tècnica de replicació) para ports de memòria i unitats funcionals. En canvi, el gran cost en termes d'àrea i temps de cicle d'aquesta tècnica limita tenir alts graus de replicació: alts valors en temps de cicle contraresten els guanys deguts al decrement en el nombre de cicles, mentre que alts valors en l'àrea requerida poden portar a configuracions impossibles d'implementar. Una alternativa a la replicació de recursos, és fer los més amples (tècnica que anomenem "widening"), i que ha estat usada en alguns dissenys recents. Amb aquesta tècnica, l'amplitud dels recursos s'amplia, fent una mateixa operació sobre múltiples dades. Per altra banda, alguns microprocessadors escalars de propòsit general han estat implementats amb unitats de coma flotants que implementen la instrucció sumar i multiplicar unificada (tècnica de fusió), el que redueix la latència de la operació combinada, tanmateix com el nombre de recursos utilitzats. A aquest treball s'avaluen un ampli conjunt d'alternatives de disseny de processadors VLIW que combinen les tres tècniques. S'efectua una projecció tecnològica de les noves generacions de processadors per predir les possibles alternatives implementables. Com a conclusió, demostrem que tenint en compte el cost, combinar certs graus de replicació i "widening" als recursos hardware és més efectiu que aplicar únicament replicació. Així mateix, confirmem que fer servir unitats que fusionen multiplicació i suma pot tenir un impacte molt significatiu en l'increment de rendiment en futures arquitectures de processadors a un cost molt raonable.Loops are the main time-consuming part of numerical applications. The performance of the loops is limited either by the resources offered by the architecture or by recurrences in the computation. To execute more operations per cycle, current processors are designed with growing degrees of resource replication (replication technique) for memory ports and functional units. However, the high cost in terms of area and cycle time of this technique precludes the use of high degrees of replication. High values for the cycle time may clearly offset any gain in terms of number of execution cycles. High values for the area may lead to an unimplementable configuration. An alternative to resource replication is resource widening (widening technique), which has also been used in some recent designs in which the width of the resources is increased (i.e., a single operation is performed over multiple data). Moreover, several general-purpose superscalar microprocessors have been implemented with multiply-add fused floating point units (fusion technique), which reduces the latency of the combined operation and the number of resources used. On this thesis, we evaluate a broad set of VLIW processor design alternatives that combine the three techniques. We perform a technological projection for the next processor generations in order to foresee the possible implementable alternatives. From this study, we conclude that if the cost is taken into account, combining certain degrees of replication and widening in the hardware resources is more effective than applying only replication. Also, we confirm that multiply-add fused units will have a significant impact in raising the performance of future processor architectures with a reasonable increase in cost

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Streamlining data cache access with fast address calculation

Author: Aho Alfred V.
Dionisios N. Pnevmatikatos
Golden Michael
Gurindar S. Sohi
John
Steven
Todd M. Austin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Recommended from our members

Combined branch target and predicate prediction

Author: Douglas Burger
Stephen W. Keckler
Publication venue: United States Patent and Trademark Office
Publication date: 25/03/2015
Field of study

Embodiments provide methods, apparatus, systems, and computer readable media associated with predicting predicates and branch targets during execution of programs using combined branch target and predicate predictions. The predictions may be made using one or more prediction control flow graphs which represent predicates in instruction blocks and branches between blocks in a program. The prediction control flow graphs may be structured as trees such that each node in the graphs is associated with a predicate instruction, and each leaf associated with a branch target which jumps to another block. During execution of a block, a prediction generator may take a control point history and generate a prediction. Following the path suggested by the prediction through the tree, both predicate values and branch targets may be predicted. Other embodiments may be described and claimed.Board of Regents, University of Texas Syste

Texas ScholarWorks

Scaling to the end of silicon with EDGE architectures

Author: C. Lin
C.R. Moore
D. Burger
J. Burrill
K.S. McKinley
L.K. John
M. Dahlin
R.G. McDonald
S.W. Keckler
W. Yoder
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

TRIPS Architecture for Parallel Computation

Author: Krutek Roland
Publication venue: Vysoká škola báňská - Technická univerzita Ostrava
Publication date: 01/01/2011
Field of study

Import 04/07/2011Práce se zabývá představením nové technologie v oblasti mikroprocesorové architektury, zvané TRIPS. První teoretická část popisuje vlastnosti a technické parametry nové architektury. V další fázi si stručně představíme možnosti, které platforma nabízí pro vývojáře. Praktická část ověřuje poznatky získané při zkoumání nové technologie pomocí implementace vlastních algoritmů. V závěru se pokusíme dosažené výsledky vyhodnotit a i na základě těchto informací porovnat s konkurencí.The thesis deals with the introduction of new technologies in the field of microprocessor architecture, called TRIPS. The first part describes the theoretical and technical characteristics of the new architecture. The next part will briefly introduce the possibilities that the platform provides for developers. The practical part verifies the knowledge gained from exploring new technologies through the implementation of our own algorithms. In conclusion, we attempt to evaluate the achievements and on the basis of this information compare the results with the competition.460 - Katedra informatikydobř

DSpace at VSB Technical University of Ostrava

The program decision logic approach to predicated execution

Author: Daniel A. Connors
David I. August
Jean-Michel Puiatti
John W. Sias
Kevin M. Crozier
Scott A. Mahlke
Wakerly J. E
Wen-mei W. Hwu
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref