162 research outputs found

    Cross-Layer Rapid Prototyping and Synthesis of Application-Specific and Reconfigurable Many-accelerator Platforms

    Get PDF
    Technological advances of recent years laid the foundation consolidation of informatisationof society, impacting on economic, political, cultural and socialdimensions. At the peak of this realization, today, more and more everydaydevices are connected to the web, giving the term ”Internet of Things”. The futureholds the full connection and interaction of IT and communications systemsto the natural world, delimiting the transition to natural cyber systems and offeringmeta-services in the physical world, such as personalized medical care, autonomoustransportation, smart energy cities etc. . Outlining the necessities of this dynamicallyevolving market, computer engineers are required to implement computingplatforms that incorporate both increased systemic complexity and also cover awide range of meta-characteristics, such as the cost and design time, reliabilityand reuse, which are prescribed by a conflicting set of functional, technical andconstruction constraints. This thesis aims to address these design challenges bydeveloping methodologies and hardware/software co-design tools that enable therapid implementation and efficient synthesis of architectural solutions, which specifyoperating meta-features required by the modern market. Specifically, this thesispresents a) methodologies to accelerate the design flow for both reconfigurableand application-specific architectures, b) coarse-grain heterogeneous architecturaltemplates for processing and communication acceleration and c) efficient multiobjectivesynthesis techniques both at high abstraction level of programming andphysical silicon level.Regarding to the acceleration of the design flow, the proposed methodologyemploys virtual platforms in order to hide architectural details and drastically reducesimulation time. An extension of this framework introduces the systemicco-simulation using reconfigurable acceleration platforms as co-emulation intermediateplatforms. Thus, the development cycle of a hardware/software productis accelerated by moving from a vertical serial flow to a circular interactive loop.Moreover the simulation capabilities are enriched with efficient detection and correctiontechniques of design errors, as well as control methods of performancemetrics of the system according to the desired specifications, during all phasesof the system development. In orthogonal correlation with the aforementionedmethodological framework, a new architectural template is proposed, aiming atbridging the gap between design complexity and technological productivity usingspecialized hardware accelerators in heterogeneous systems-on-chip and networkon-chip platforms. It is presented a novel co-design methodology for the hardwareaccelerators and their respective programming software, including the tasks allocationto the available resources of the system/network. The introduced frameworkprovides implementation techniques for the accelerators, using either conventionalprogramming flows with hardware description language or abstract programmingmodel flows, using techniques from high-level synthesis. In any case, it is providedthe option of systemic measures optimization, such as the processing speed,the throughput, the reliability, the power consumption and the design silicon area.Finally, on addressing the increased complexity in design tools of reconfigurablesystems, there are proposed novel multi-objective optimization evolutionary algo-rithms which exploit the modern multicore processors and the coarse-grain natureof multithreaded programming environments (e.g. OpenMP) in order to reduce theplacement time, while by simultaneously grouping the applications based on theirintrinsic characteristics, the effectively explore the design space effectively.The efficiency of the proposed architectural templates, design tools and methodologyflows is evaluated in relation to the existing edge solutions with applicationsfrom typical computing domains, such as digital signal processing, multimedia andarithmetic complexity, as well as from systemic heterogeneous environments, suchas a computer vision system for autonomous robotic space navigation and manyacceleratorsystems for HPC and workstations/datacenters. The results strengthenthe belief of the author, that this thesis provides competitive expertise to addresscomplex modern - and projected future - design challenges.Οι τεχνολογικές εξελίξεις των τελευταίων ετών έθεσαν τα θεμέλια εδραίωσης της πληροφοριοποίησης της κοινωνίας, επιδρώντας σε οικονομικές,πολιτικές, πολιτιστικές και κοινωνικές διαστάσεις. Στο απόγειο αυτής τη ςπραγμάτωσης, σήμερα, ολοένα και περισσότερες καθημερινές συσκευές συνδέονται στο παγκόσμιο ιστό, αποδίδοντας τον όρο «Ίντερνετ των πραγμάτων».Το μέλλον επιφυλάσσει την πλήρη σύνδεση και αλληλεπίδραση των συστημάτων πληροφορικής και επικοινωνιών με τον φυσικό κόσμο, οριοθετώντας τη μετάβαση στα συστήματα φυσικού κυβερνοχώρου και προσφέροντας μεταυπηρεσίες στον φυσικό κόσμο όπως προσωποποιημένη ιατρική περίθαλψη, αυτόνομες μετακινήσεις, έξυπνες ενεργειακά πόλεις κ.α. . Σκιαγραφώντας τις ανάγκες αυτής της δυναμικά εξελισσόμενης αγοράς, οι μηχανικοί υπολογιστών καλούνται να υλοποιήσουν υπολογιστικές πλατφόρμες που αφενός ενσωματώνουν αυξημένη συστημική πολυπλοκότητα και αφετέρου καλύπτουν ένα ευρύ φάσμα μεταχαρακτηριστικών, όπως λ.χ. το κόστος σχεδιασμού, ο χρόνος σχεδιασμού, η αξιοπιστία και η επαναχρησιμοποίηση, τα οποία προδιαγράφονται από ένα αντικρουόμενο σύνολο λειτουργικών, τεχνολογικών και κατασκευαστικών περιορισμών. Η παρούσα διατριβή στοχεύει στην αντιμετώπιση των παραπάνω σχεδιαστικών προκλήσεων, μέσω της ανάπτυξης μεθοδολογιών και εργαλείων συνσχεδίασης υλικού/λογισμικού που επιτρέπουν την ταχεία υλοποίηση καθώς και την αποδοτική σύνθεση αρχιτεκτονικών λύσεων, οι οποίες προδιαγράφουν τα μετα-χαρακτηριστικά λειτουργίας που απαιτεί η σύγχρονη αγορά. Συγκεκριμένα, στα πλαίσια αυτής της διατριβής, παρουσιάζονται α) μεθοδολογίες επιτάχυνσης της ροής σχεδιασμού τόσο για επαναδιαμορφούμενες όσο και για εξειδικευμένες αρχιτεκτονικές, β) ετερογενή αδρομερή αρχιτεκτονικά πρότυπα επιτάχυνσης επεξεργασίας και επικοινωνίας και γ) αποδοτικές τεχνικές πολυκριτηριακής σύνθεσης τόσο σε υψηλό αφαιρετικό επίπεδο προγραμματισμού,όσο και σε φυσικό επίπεδο πυριτίου.Αναφορικά προς την επιτάχυνση της ροής σχεδιασμού, προτείνεται μια μεθοδολογία που χρησιμοποιεί εικονικές πλατφόρμες, οι οποίες αφαιρώντας τις αρχιτεκτονικές λεπτομέρειες καταφέρνουν να μειώσουν σημαντικά το χρόνο εξομοίωσης. Παράλληλα, εισηγείται η συστημική συν-εξομοίωση με τη χρήση επαναδιαμορφούμενων πλατφορμών, ως μέσων επιτάχυνσης. Με αυτόν τον τρόπο, ο κύκλος ανάπτυξης ενός προϊόντος υλικού, μετατεθειμένος από την κάθετη σειριακή ροή σε έναν κυκλικό αλληλεπιδραστικό βρόγχο, καθίσταται ταχύτερος, ενώ οι δυνατότητες προσομοίωσης εμπλουτίζονται με αποδοτικότερες μεθόδους εντοπισμού και διόρθωσης σχεδιαστικών σφαλμάτων, καθώς και μεθόδους ελέγχου των μετρικών απόδοσης του συστήματος σε σχέση με τις επιθυμητές προδιαγραφές, σε όλες τις φάσεις ανάπτυξης του συστήματος. Σε ορθογώνια συνάφεια με το προαναφερθέν μεθοδολογικό πλαίσιο, προτείνονται νέα αρχιτεκτονικά πρότυπα που στοχεύουν στη γεφύρωση του χάσματος μεταξύ της σχεδιαστικής πολυπλοκότητας και της τεχνολογικής παραγωγικότητας, με τη χρήση συστημάτων εξειδικευμένων επιταχυντών υλικού σε ετερογενή συστήματα-σε-ψηφίδα καθώς και δίκτυα-σε-ψηφίδα. Παρουσιάζεται κατάλληλη μεθοδολογία συν-σχεδίασης των επιταχυντών υλικού και του λογισμικού προκειμένου να αποφασισθεί η κατανομή των εργασιών στους διαθέσιμους πόρους του συστήματος/δικτύου. Το μεθοδολογικό πλαίσιο προβλέπει την υλοποίηση των επιταχυντών είτε με συμβατικές μεθόδους προγραμματισμού σε γλώσσα περιγραφής υλικού είτε με αφαιρετικό προγραμματιστικό μοντέλο με τη χρήση τεχνικών υψηλού επιπέδου σύνθεσης. Σε κάθε περίπτωση, δίδεται η δυνατότητα στο σχεδιαστή για βελτιστοποίηση συστημικών μετρικών, όπως η ταχύτητα επεξεργασίας, η ρυθμαπόδοση, η αξιοπιστία, η κατανάλωση ενέργειας και η επιφάνεια πυριτίου του σχεδιασμού. Τέλος, προκειμένου να αντιμετωπισθεί η αυξημένη πολυπλοκότητα στα σχεδιαστικά εργαλεία επαναδιαμορφούμενων συστημάτων, προτείνονται νέοι εξελικτικοί αλγόριθμοι πολυκριτηριακής βελτιστοποίησης, οι οποίοι εκμεταλλευόμενοι τους σύγχρονους πολυπύρηνους επεξεργαστές και την αδρομερή φύση των πολυνηματικών περιβαλλόντων προγραμματισμού (π.χ. OpenMP), μειώνουν το χρόνο επίλυσης του προβλήματος της τοποθέτησης των λογικών πόρων σε φυσικούς,ενώ ταυτόχρονα, ομαδοποιώντας τις εφαρμογές βάση των εγγενών χαρακτηριστικών τους, διερευνούν αποτελεσματικότερα το χώρο σχεδίασης.Η αποδοτικότητά των προτεινόμενων αρχιτεκτονικών προτύπων και μεθοδολογιών επαληθεύτηκε σε σχέση με τις υφιστάμενες λύσεις αιχμής τόσο σε αυτοτελής εφαρμογές, όπως η ψηφιακή επεξεργασία σήματος, τα πολυμέσα και τα προβλήματα αριθμητικής πολυπλοκότητας, καθώς και σε συστημικά ετερογενή περιβάλλοντα, όπως ένα σύστημα όρασης υπολογιστών για αυτόνομα διαστημικά ρομποτικά οχήματα και ένα σύστημα πολλαπλών επιταχυντών υλικού για σταθμούς εργασίας και κέντρα δεδομένων, στοχεύοντας εφαρμογές υψηλής υπολογιστικής απόδοσης (HPC). Τα αποτελέσματα ενισχύουν την πεποίθηση του γράφοντα, ότι η παρούσα διατριβή παρέχει ανταγωνιστική τεχνογνωσία για την αντιμετώπιση των πολύπλοκων σύγχρονων και προβλεπόμενα μελλοντικών σχεδιαστικών προκλήσεων

    Implementation of RISC Processor for DSPAcceleratorArchitectureExploiting Carry Save Arithmetic

    Get PDF
    Hardware acceleration has been proved an extremely promisingimplementation strategyforthedigitalsignal processing(DSP) domain.Ratherthanadoptingamonolithicapplication-specificintegrated circuit designapproach,  in thisbrief, we present a  novel accelerator architecture comprising flexiblecomputational  units that support the executionofalargesetofoperationtemplatesfoundinDSPkernels. Wedifferentiatefrompreviousworksonflexibleacceleratorsbyenabling computations tobeaggressivelyperformedwithcarry-save(CS)formatteddata.Advancedarithmeticdesignconcepts, i.e.,recodingtechniques, areutilizedenabling CSoptimizationstobeperformedinalargerscope thaninpreviousapproaches.Extensiveexperimentalevaluationsshow thattheproposedacceleratorarchitecturedeliversaveragegainsofup to 61.91%in area-delay productand54.43%in energy consumption comparedwiththestate-of-artflexibledatapaths. In this paper, their concentration is on 16 bit operations but here in the proposed scheme, the focus is on 32 bit operations.Hardware Acceleration basically refers to the usage of computer hardware to perform some functions faster than they are actually possible within the software running on general purpose CPU. TheRISCor ReducedInstructionSetComputerisadesignphilosophythathasbecomeamainstreaminScientificandengineeringapplications.Themainobjectiveofthispaperis to design and implement of 32 – bit RISC(ReducedInstruction Set Computer) processor forflexible DSP Accelerator Architecture.Thedesignwillhelp to improve the speed of the processor, and to give thehigherperformance of the processor. The most important featureofthe RISC processor is that this processor is very simpleandsupport load/store architecture. The important componentsofthis processor include the Arithmetic Logic Unit,Shifter,Rotator and Control unit. The module functionalityandperformance issues like area, power dissipationandpropagation delay are analyzed. Therefore, here we meet some of the main constraints likeComplexity of the instruction set, which will reduce the amount of space, time, cost, power, heat and other things that it takes to implement the instruction set part of a processor. As the Time of execution decreases, the Speed of execution automatically increases.Hardware acceleration has been proved an extremely promisingimplementation strategyforthedigitalsignal processing(DSP) domain.Ratherthanadoptingamonolithicapplication-specificintegrated circuit designapproach,  in thisbrief, we present a  novel accelerator architecture comprising flexiblecomputational  units that support the executionofalargesetofoperationtemplatesfoundinDSPkernels. Wedifferentiatefrompreviousworksonflexibleacceleratorsbyenabling computations tobeaggressivelyperformedwithcarry-save(CS)formatteddata.Advancedarithmeticdesignconcepts, i.e.,recodingtechniques, areutilizedenabling CSoptimizationstobeperformedinalargerscope thaninpreviousapproaches.Extensiveexperimentalevaluationsshow thattheproposedacceleratorarchitecturedeliversaveragegainsofup to 61.91%in area-delay productand54.43%in energy consumption comparedwiththestate-of-artflexibledatapaths. In this paper, their concentration is on 16 bit operations but here in the proposed scheme, the focus is on 32 bit operations.Hardware Acceleration basically refers to the usage of computer hardware to perform some functions faster than they are actually possible within the software running on general purpose CPU. TheRISCor ReducedInstructionSetComputerisadesignphilosophythathasbecomeamainstreaminScientificandengineeringapplications.Themainobjectiveofthispaperis to design and implement of 32 – bit RISC(ReducedInstruction Set Computer) processor forflexible DSP Accelerator Architecture.Thedesignwillhelp to improve the speed of the processor, and to give thehigherperformance of the processor. The most important featureofthe RISC processor is that this processor is very simpleandsupport load/store architecture. The important componentsofthis processor include the Arithmetic Logic Unit,Shifter,Rotator and Control unit. The module functionalityandperformance issues like area, power dissipationandpropagation delay are analyzed. Therefore, here we meet some of the main constraints likeComplexity of the instruction set, which will reduce the amount of space, time, cost, power, heat and other things that it takes to implement the instruction set part of a processor. As the Time of execution decreases, the Speed of execution automatically increases

    재구성형 연산 구조를 위한 부동소수점 지원

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 최기영.With a huge increase in demand for various kinds of compute-intensive applications in electronic systems, researchers have focused on coarse-grained reconfigurable architectures because of their advantages: high performance and flexibility. Besides, supporting floating-point operations on coarse-grained reconfigurable architecture becomes essential as the increase of demands on various floating-point inclusive applications such as multimedia processing, 3D graphics, augmented reality, or object recognition. This thesis presents FloRA, a coarse-grained reconfigurable architecture with floating-point support. Two-dimensional array of integer processing elements in FloRA is configured at run-time to perform floating-point operations as well as integer operations. More specifically, each floating-point operation is performed by two integer processing elements, one for mantissa and the other for exponent. Fabricated using 130nm process, the total area overhead due to additional hardware for floating-point operations is about 7.4% compared to the previous architecture which does not support floating-point operations. The fabricated chip runs at 125MHz clock frequency and 1.2V power supply. Experiments show 11.6x speedup on average compared to ARM9 with a vector-floating-point unit for integer-only benchmark programs as well as programs containing floating-point operations. Compared with other similar approaches including XPP and Butter, the proposed architecture shows much higher performance for integer applications, while maintaining about half the performance of Butter for floating-point applications. This thesis also proposes novel techniques to enhance utilization of integer units for high-throughput floating-point operations on CGRA. The approach to implementing floating-point operations on CGRA presented in this thesis enables floating-point functionality with less area overhead compared to the traditional approach of employing separate floating-point units (FPUs). However the total latency of a floating-point operation is larger than that of the traditional approach and the data dependency between split integer operations restricts further enhancement in terms of utilization of integer functional units in an operation. In order to overcome such inefficiency, two techniques are proposed in this thesis. One is overlapping two distinct floating-point operations, which increases the efficiency in terms of utilizations of integer functional units in the architecture. Free integer functional units in a floating-point operation can be used for another floating-point operation with this technique. The other is forwarding between two data-dependent floating-point operations, which decreases effective latency of the floating-point operations. The basic idea is to remove unnecessary calculations such as formatting which is normally done in between the two data-dependent floating-point operations. To implement the overlapping or forwarding, FSMs and control paths in each PE are modified and temporal/communication registers are added. Light-weight sub-module such as increment units and registers for intermediate values are added for releasing resource conflict. Experiment is done with several arithmetic functions that are widely used in floating-point applications. The base architecture and the new architecture implementing the proposed technique are compared in terms of throughput and area overhead. The experimental result shows that the proposed technique increases the throughput by 33.9% on average with 20.9% of area overhead.Abstract i Contents v List of Figures ix List of Tables xv Chapter 1 INTRODUCTION 1 Chapter 2 TARGET ARCHITECTURE 7 2.1 Overall Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Reconfigurable Computing Module . . . . . . . . . . . . . . . . . 8 Chapter 3 DEGISN OF FLOATING-POINT OPERATIONS 15 3.1 Floating-point Numbers . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Representation of floating-point numbers . . . . . . . . . . 15 3.1.2 Floating-point operations . . . . . . . . . . . . . . . . . . . 19 3.2 FPU-PE Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.1 Construction of FPU-PE Cluster . . . . . . . . . . . . . . . 20 3.2.2 Construction of Array of FPU-PE Clusters . . . . . . . . . 21 3.2.3 Comparing Different FPU-PE Clusters . . . . . . . . . . . 23 3.3 Implementation of Multi-Cycle Operations . . . . . . . . . . . . 26 3.4 Implementation of Floating-Point Operations . . . . . . . . . . . 30 3.5 Implementation of Floating-Point Operations Using Shared Modules . . . 32 Chapter 4 Chip Implementation 35 4.1 Specification of Chip Implementation . . . . . . . . . . . . . . . . 35 4.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 Experimantal Results . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3.1 Performance Comparison . . . . . . . . . . . . . . . . . . . 39 4.3.2 Power Consumption Comparison . . . . . . . . . . . . . . 42 Chapter 5 Comparison with Other Architectures 45 5.1 Preparation for the comparison . . . . . . . . . . . . . . . . . . . 45 5.2 Comparison with PACT XPP . . . . . . . . . . . . . . . . . . . . . 47 5.3 Comparison with Butter Architecture . . . . . . . . . . . . . . . . 50 5.4 Implication of the proposed architecture . . . . . . . . . . . . . . 57 Chapter 6 Enhancement Techniques 63 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.2 Conventional Approach . . . . . . . . . . . . . . . . . . . . . . . 64 6.2.1 Base Architecture . . . . . . . . . . . . . . . . . . . . . . . 64 6.2.2 Utilization of Floating-Point Operations . . . . . . . . . . 65 6.3 Proposed Enhancement Techniques . . . . . . . . . . . . . . . . . 66 6.3.1 Overlapping Technique . . . . . . . . . . . . . . . . . . . . 66 6.3.2 Forwarding Technique . . . . . . . . . . . . . . . . . . . . . 71 6.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.4.1 Performance Comparison . . . . . . . . . . . . . . . . . . . 76 6.4.2 Hardware Cost of the Proposed Techniques . . . . . . . . . 77 6.4.3 Utilization Enhancement by the Proposed Techniques . . . 80 6.5 Comparison with Other Architecture . . . . . . . . . . . . . . . . 87 Chapter 7 Conclusion 93 Bibliography 95 국문초록 103 감사의 글 105Docto

    Conception de systèmes embarqués fiables et auto-réglables : applications sur les systèmes de transport ferroviaire

    Get PDF
    During the last few decades, a tremendous progress in the performance of semiconductor devices has been accomplished. In this emerging era of high performance applications, machines need not only to be efficient but also need to be dependable at circuit and system levels. Several works have been proposed to increase embedded systems efficiency by reducing the gap between software flexibility and hardware high-performance. Due to their reconfigurable aspect, Field Programmable Gate Arrays (FPGAs) represented a relevant step towards bridging this performance/flexibility gap. Nevertheless, Dynamic Reconfiguration (DR) has been continuously suffering from a bottleneck corresponding to a long reconfiguration time.In this thesis, we propose a novel medium-grained high-speed dynamic reconfiguration technique for DSP48E1-based circuits. The idea is to take advantage of the DSP48E1 slices runtime reprogrammability coupled with a re-routable interconnection block to change the overall circuit functionality in one clock cycle. In addition to the embedded systems efficiency, this thesis deals with the reliability chanllenges in new sub-micron electronic systems. In fact, as new technologies rely on reduced transistor size and lower supply voltages to improve performance, electronic circuits are becoming remarkably sensitive and increasingly susceptible to transient errors. The system-level impact of these errors can be far-reaching and Single Event Transients (SETs) have become a serious threat to embedded systems reliability, especially for especially for safety critical applications such as transportation systems. The reliability enhancement techniques that are based on overestimated soft error rates (SERs) can lead to unnecessary resource overheads as well as high power consumption. Considering error masking phenomena is a fundamental element for an accurate estimation of SERs.This thesis proposes a new cross-layer model of circuits vulnerability based on a combined modeling of Transistor Level (TLM) and System Level Masking (SLM) mechanisms. We then use this model to build a self adaptive fault tolerant architecture that evaluates the circuit’s effective vulnerability at runtime. Accordingly, the reliability enhancement strategy is adapted to protect only vulnerable parts of the system leading to a reliable circuit with optimized overheads. Experimentations performed on a radar-based obstacle detection system for railway transportation show that the proposed approach allows relevant reliability/resource utilization tradeoffs.Un énorme progrès dans les performances des semiconducteurs a été accompli ces dernières années. Avec l’´émergence d’applications complexes, les systèmes embarqués doivent être à la fois performants et fiables. Une multitude de travaux ont été proposés pour améliorer l’efficacité des systèmes embarqués en réduisant le décalage entre la flexibilité des solutions logicielles et la haute performance des solutions matérielles. En vertu de leur nature reconfigurable, les FPGAs (Field Programmable Gate Arrays) représentent un pas considérable pour réduire ce décalage performance/flexibilité. Cependant, la reconfiguration dynamique a toujours souffert d’une limitation liée à la latence de reconfiguration.Dans cette thèse, une nouvelle technique de reconfiguration dynamiqueau niveau ”grain-moyen” pour les circuits à base de blocks DSP48E1 est proposée. L’idée est de profiter de la reprogrammabilité des blocks DSP48E1 couplée avec un circuit d’interconnection reconfigurable afin de changer la fonction implémentée par le circuit en un cycle horloge. D’autre part, comme les nouvelles technologies s’appuient sur la réduction des dimensions des transistors ainsi que les tensions d’alimentation, les circuits électroniques sont devenus de plus en plus susceptibles aux fautes transitoires. L’impact de ces erreurs au niveau système peut être catastrophique et les SETs (Single Event Transients) sont devenus une menace tangible à la fiabilité des systèmes embarqués, en l’occurrence pour les applications critiques comme les systèmes de transport. Les techniques de fiabilité qui se basent sur des taux d’erreurs (SERs) surestimés peuvent conduire à un gaspillage de ressources et par conséquent un cout en consommation de puissance électrique. Il est primordial de prendre en compte le phénomène de masquage d’erreur pour une estimation précise des SERs.Cette thèse propose une nouvelle modélisation inter-couches de la vulnérabilité des circuits qui combine les mécanismes de masquage au niveau transistor (TLM) et le masquage au niveau Système (SLM). Ce modèle est ensuite utilisé afin de construire une architecture adaptative tolérante aux fautes qui évalue la vulnérabilité effective du circuit en runtime. La stratégie d’amélioration de fiabilité est adaptée pour ne protéger que les parties vulnérables du système, ce qui engendre un circuit fiable avec un cout optimisé. Les expérimentations effectuées sur un système de détection d’obstacles à base de radar pour le transport ferroviaire montre que l’approche proposée permet d’´établir un compromis fiabilité/ressources utilisées

    High-Level Synthesis Hardware Design for FPGA-Based Accelerators: Models, Methodologies, and Frameworks

    Get PDF
    Hardware accelerators based on field programmable gate array (FPGA) and system on chip (SoC) devices have gained attention in recent years. One of the main reasons is that these devices contain reconfigurable logic, which makes them feasible for boosting the performance of applications. High-level synthesis (HLS) tools facilitate the creation of FPGA code from a high level of abstraction using different directives to obtain an optimized hardware design based on performance metrics. However, the complexity of the design space depends on different factors such as the number of directives used in the source code, the available resources in the device, and the clock frequency. Design space exploration (DSE) techniques comprise the evaluation of multiple implementations with different combinations of directives to obtain a design with a good compromise between different metrics. This paper presents a survey of models, methodologies, and frameworks proposed for metric estimation, FPGA-based DSE, and power consumption estimation on FPGA/SoC. The main features, limitations, and trade-offs of these approaches are described. We also present the integration of existing models and frameworks in diverse research areas and identify the different challenges to be addressed

    Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

    Get PDF
    In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

    Automatic pipelining and vectorization of scientific code for FPGAs

    Get PDF
    There is a large body of legacy scientific code in use today that could benefit from execution on accelerator devices like GPUs and FPGAs. Manual translation of such legacy code into device-specific parallel code requires significant manual effort and is a major obstacle to wider FPGA adoption. We are developing an automated optimizing compiler TyTra to overcome this obstacle. The TyTra flow aims to compile legacy Fortran code automatically for FPGA-based acceleration, while applying suitable optimizations. We present the flow with a focus on two key optimizations, automatic pipelining and vectorization. Our compiler frontend extracts patterns from legacy Fortran code that can be pipelined and vectorized. The backend first creates fine and coarse-grained pipelines and then automatically vectorizes both the memory access and the datapath based on a cost model, generating an OpenCL-HDL hybrid working solution for FPGA targets on the Amazon cloud. Our results show up to 4.2× performance improvement over baseline OpenCL code
    corecore