Search CORE

7 research outputs found

Algorithms for Improving the Automatically Synthesized Instruction Set of an Extensible Processor

Author: Sovietov Peter
Publication venue
Publication date: 01/01/2024
Field of study

Processors with extensible instruction sets are often used today as programmable hardware accelerators for various domains. When extending RISC-V and other similar extensible processor architectures, the task of designing specialized instructions arises. This task can be solved automatically by using instruction synthesis algorithms. In this paper, we consider algorithms that can be used in addition to the known approaches and improve the synthesized instruction sets by recomputing common operations (the result of which is consumed by multiple operations) of a program inside clustered synthesized instructions (common operations clustering algorithm), and by identifying redundant (which have equivalents among the other instructions) synthesized instructions (subsuming functions algorithm). Experimental evaluations of the developed algorithms are presented for the tests from the domains of cryptography and three-dimensional graphics. For Magma cipher test, the common operations clustering algorithm allows reducing the size of the compiled code by 9%, and the subsuming functions algorithm allows reducing the synthesized instruction set extension size by 2 times. For AES cipher test, the common operations clustering algorithm allows reducing the size of the compiled code by 10%, and the subsuming functions algorithm allows reducing the synthesized instruction set extension size by 2.5 times. Finally, for the instruction set extension from Volume Ray-Casting test, the additional use of subsuming functions algorithm allows reducing problem-specific instruction extension set size from 5 to only 2 instructions without losing its functionality

arXiv.org e-Print Archive

Recommended from our members

Modifying Instruction Sets In The Gem5 Simulator To Support Fault Tolerant Designs

Author: Zhang Chuan
Publication venue: ScholarWorks@UMass Amherst
Publication date: 23/11/2015
Field of study

Traditional fault tolerant techniques such as hardware or time redundancy incur high overhead and are inefficient for checking arithmetic operations. Our objective is to study an alternative approach of adding new instructions to check arithmetic operations. These checking instructions either rely on error detecting code or calculate approximate results and consequently, consume much less execution time. To evaluate the effectiveness of such an approach we wish to modify several benchmarks to use checking instructions and run simulation experiments to find out their execution time and memory usage. However, the checking instructions are not included in the instruction set and as a result, are not supported by current architecture simulators. Therefore, another objective of this thesis is to develop a method for inserting new instructions in the Gem5 simulator and cross compiler. The insertion process is integrated into a software tool called Gtool. Gtool can add an error checking capability to C programs by using the new instructions

ScholarWorks@UMass Amherst

Sélection automatique d'instructions et ordonnancement d'applications basés sur la programmation par contraintes

Author: Charot François
Floch Antoine
Kuchcinski Krzysztof
Martin Kevin
Wolinski Christophe
Publication venue: HAL CCSD
Publication date: 09/09/2009
Field of study

National audienceCe papier présente une nouvelle méthode, basée sur la programmation par contraintes, pour la sélection de motifs de calcul, le placement et l'ordonnancement d'applications sur des extensions de processeurs conﬁgurables. Cette méthode est intégrée dans l'environnement DURASE (Generic Environment for Design and Utilization of Reconﬁgurable Application-Speciﬁc Processors Extensions). Les extensions du proces- seur, qui mettent en œuvre les motifs de calcul et qui sont accessibles via des instructions spécialisées, sont fortement couplées au chemin de données du processeur. Ces instructions spécialisées sont géné- rées et sélectionnées à partir du graphe de l'application. Notre méthode supporte un ordonnancement sous contrainte de ressources ou sous contrainte de temps. Les résultats expérimentaux obtenus sur les benchmarks MediaBench et MiBench montrent une accélération de l'exécution des applications d'un facteur de 2,3 en moyenne

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Automatic Instruction Set Extension and Utilization for Embedded Processors

Author: De Micheli Giovanni
Ienne Paolo
Peymandoust Armita
Pozzi Laura
Publication venue
Publication date: 08/08/2005
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Χρήση μοντέλου παράλληλου προγραμματισμού για σύνθεση αρχιτεκτονικών

Author: Owaida Muhsen
Publication venue
Publication date: 01/01/2012
Field of study

The problem of automatically generating hardware modules from high level application representations has been at the forefront of EDA research during the last few years. In this Dissertation we introduce a methodology to automatically synthesize hardware accelerators from OpenCL applications. OpenCL is a recent industry supported standard for writing programs that execute on multicore platforms and accelerators such as GPUs. Our methodology maps OpenCL kernels into hardware accelerators based on architectural templates that explicitly decouple computation from memory communication whenever this is possible. The templates can be tuned to provide a wide repertoire of accelerators that meet user performance requirements and FPGA device characteristics. Furthermore a set of high- and low-level compiler optimizations is applied to generate optimized accelerators. Our experimental evaluation shows that the generated accelerators are tuned efficiently to match the applications memory access pattern and computational complexity and to achieve user performance requirements. An important objective of our tool is to expand the FPGA development user base to software engineers thereby expanding the scope of FPGAs beyond the realm of hardware design.To πρόβλημα της αυτόματης δημιουργίας μονάδων υλικό από παραστάσεις υψηλού επιπέδου εφαρμογής είναι στην πρώτη γραμμή της EDA έρευνας κατά τη διάρκεια των τελευταίων ετών. Σε αυτή την διατριβή παρουσιάζουμε μια μεθοδολογία για τη αυτόματη σύνθεση επιταχυντές υλικού από εφαρμογές OpenCL. OpenCL είναι ένα πρόσφατο πρότυπο για τη σύνταξη των προγραμμάτων που εκτελούνται σε πλατφόρμες πολλαπλών πυρήνων και επιταχυντές όπως GPUs. Η μεθοδολογία μας μετατρέπει προγράμματα OpenCL σε επιταχυντές υλικού με βάση αρχιτεκτονικά πρότυπα που ρητά αποσυνδέει τους υπολογισμούς από την μεταφορά δεδομένων από/προς την μνήμη όποτε αυτό είναι δυνατό. Τα πρότυπα μπορούν να συντονιστούν ώστε να παρέχουν ένα ευρύ ρεπερτόριο από επιταχυντές που πληρούν τις απαιτήσεις απόδοσης των χρηστών και τα χαρακτηριστικά της συσκευής FPGA. Επιπλέον ένα σύνολο υψηλής και χαμηλής στάθμης βελτιστοποιήσεις μεταγλωττιστή εφαρμόζεται για να παράγει βελτιστοποιημένα επιταχυντές. Η πειραματική αξιολόγηση δείχνει ότι οι επιταχυντές που δημιουργούνται αποτελεσματικά συντονισμένοι για να ταιριάζει με το μοτίβο πρόσβασης στην μνήμη κάθε εφαρμογής και την υπολογιστική πολυπλοκότητα και να επιτύχουν τις απαιτήσεις απόδοσης των χρηστών. Ένας σημαντικός στόχος του εργαλείου μας είναι η επέκταση της βάσης χρηστών πλατφόρμες FPGA για μηχανικούς λογισμικού ώστε να γίνει ανάπτυξη FPGA συστήματα από μηχανικούς λογισμικού χωρίς την ανάγκη για εμπειρία σχεδιασμού υλικού

Hellenic National Archive of Doctoral Dissertations

University of Thessaly Institutional Repository