Search CORE

376 research outputs found

Méthodologie de génération de plateforme de prototypage à base de multi-fpga

Author: Tang Qingshan
Publication venue: HAL CCSD
Publication date: 13/01/2015
Field of study

Multi-FPGA based prototyping is no longer optional for hardware/software integration. We can classify multi-FPGA prototyping platforms in three categories: off-the-shelf, custom and cabling. The cabling platform is semi off-the-shelf and semi custom. Nevertheless, crafting a custom and a cabling platform is today a manual process, which is time-consuming. The performance and the cost of the platform lie on the FPGA expertise and SoC DUT knowledge of the engineers. Compared to OTS platforms, the added value, in terms of performance, of cabling or custom platforms can be heavily impaired by an inefficient board design. Moreover, FPGA I/Os are becoming a scarce resource, worsening the inter-FPGA bandwidth generation after generation. Therefore, it becomes more and more difficult to prototype an SoC/ASIC design at proper performance. The contributions of the manuscript are: (1). An automatic implementation flow for an OTS platform is proposed. (2). An automatic design flow for creating a custom platform is proposed, thus increasing the productivity, enabling the board exploration, and optimizing cost and performance. (3). The cabling platform is proposed where one board is composed of one FPGA and several connectors, with an algorithm to automatically find a solution for the cable distribution. (4). Thanks to the developed automatic tools, the three different multi-FPGA platforms are compared. The custom platform always achieves better performance and lower deployment cost, but still with 3-5 months in time of availability. If the performance or the deployment cost are not rigorous constraints, the cabling platform offers an attractive alternative compared to others.Face à la difficulté de l’intégration matériel/logiciel, le prototypage à base de multi-FPGA devient obligatoire dans la vérification pré-silicium. Les plateformes de prototypage peuvent être classées en trois catégories: OTS, sur mesure et câblées. La plateforme câblée est semi OTS et semi sur mesure. Néanmoins, la création d’une plateforme sur mesure et câblée est un processus manuel et chronophage. La performance et le coût de la plateforme dépend de l'expérience de concepteurs en expertise de FPGA et connaissance du système sur puce. Par rapport à des plateformes OTS, la valeur ajoutée, en terme de performance, des plateformes câblées ou sur mesure peuvent être fortement dégradée par une carte inefficace. En plus, FPGA E/S devient une ressource rare, aggravant la bande passante inter-FPGA. Par conséquent, il devient de plus en plus difficile de prototyper un design à une performance satisfaisante. Les contributions sont: (1). Un flot de implémentation automatique pour une plateforme OTS. (2). Un flot de conception automatique pour créer une plateforme sur mesure, ainsi augmentant la productivité, permettant l’exploration de carte et optimisant le coût et la performance. (3). La plateforme câblée avec un algorithme permettant automatiquement de trouver une solution pour la distribution des câbles. (4). Grâce aux flots automatique, les trois plateformes sont comparées. La plateforme sur mesure toujours réalise plus de performance et moins de coût de déploiement, mais encore avec 3-5 mois en temps de disponibilité. Si la performance ou le coût de déploiement ne sont pas les contraintes strictes, la plateforme câblée est une alternative intéressante par rapport aux autres

Thèses en Ligne

Prototyping Methodologies and Design of Communication-centric Heterogeneous Many-core Architectures

Author: Masing Leonard Jannik
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

KITopen

Automated Hardware Prototyping for 3D Network on Chips

Author: Friederich Stephanie
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2017
Field of study

Vor mehr als 50 Jahren stellte Intel® Mitbegründer Gordon Moore eine Prognose zum Entwicklungsprozess der Transistortechnologie auf. Er prognostizierte, dass sich die Zahl der Transistoren in integrierten Schaltungen alle zwei Jahre verdoppeln wird. Seine Aussage ist immer noch gültig, aber ein Ende von Moores Gesetz ist in Sicht. Mit dem Ende von Moore’s Gesetz müssen neue Aspekte untersucht werden, um weiterhin die Leistung von integrierten Schaltungen zu steigern. Zwei mögliche Ansätze für "More than Moore” sind 3D-Integrationsverfahren und heterogene Systeme. Gleichzeitig entwickelt sich ein Trend hin zu Multi-Core Prozessoren, basierend auf Networks on chips (NoCs). Neben dem Ende des Mooreschen Gesetzes ergeben sich bei immer kleiner werdenden Technologiegrößen, vor allem jenseits der 60 nm, neue Herausforderungen. Eine Schwierigkeit ist die Wärmeableitung in großskalierten integrierten Schaltkreisen und die daraus resultierende Überhitzung des Chips. Um diesem Problem in modernen Multi-Core Architekturen zu begegnen, muss auch die Verlustleistung der Netzwerkressourcen stark reduziert werden. Diese Arbeit umfasst eine durch Hardware gesteuerte Kombination aus Frequenzskalierung und Power Gating für 3D On-Chip Netzwerke, einschließlich eines FPGA Prototypen. Dafür wurde ein Takt-synchrones 2D Netzwerk auf ein dreidimensionales asynchrones Netzwerk mit mehreren Frequenzbereichen erweitert. Zusätzlich wurde ein skalierbares Online-Power-Management System mit geringem Ressourcenaufwand entwickelt. Die Verifikation neuer Hardwarekomponenten ist einer der zeitaufwendigsten Schritte im Entwicklungsprozess hochintegrierter digitaler Schaltkreise. Um diese Aufgabe zu beschleunigen und um eine parallele Softwareentwicklung zu ermöglichen, wurde im Rahmen dieser Arbeit ein automatisiertes und benutzerfreundliches Tool für den Entwurf neuer Hardware Projekte entwickelt. Eine grafische Benutzeroberfläche zum Erstellen des gesamten Designablaufs, vom Erstellen der Architektur, Parameter Deklaration, Simulation, Synthese und Test ist Teil dieses Werkzeugs. Zudem stellt die Größe der Architektur für die Erstellung eines Prototypen eine besondere Herausforderung dar. Frühere Arbeiten haben es versäumt, eine schnelles und unkompliziertes Prototyping, insbesondere von Architekturen mit mehr als 50 Prozessorkernen, zu realisieren. Diese Arbeit umfasst eine Design Space Exploration und FPGA-basierte Prototypen von verschiedenen 3D-NoC Implementierungen mit mehr als 80 Prozessoren

KITopen

A Scalable and Adaptive Network on Chip for Many-Core Architectures

Author: Heißwolf Jan
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2014
Field of study

In this work, a scalable network on chip (NoC) for future many-core architectures is proposed and investigated. It supports different QoS mechanisms to ensure predictable communication. Self-optimization is introduced to adapt the energy footprint and the performance of the network to the communication requirements. A fault tolerance concept allows to deal with permanent errors. Moreover, a template-based automated evaluation and design methodology and a synthesis flow for NoCs is introduced

KITopen

A study on hardware design for high performance artificial neural network by using FPGA and NoC

Author: Dong Yiping
Publication venue
Publication date: 01/01/2011
Field of study

制度:新 ; 報告番号:甲3421号 ; 学位の種類:博士(工学) ; 授与年月日:2011/9/15 ; 早大学位記番号:新574

Waseda University Repository

Self-timed field programmmable gate array architectures

Author: Payne Robert
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Edinburgh Research Archive

Cycle-accurate multicore performance models on FPGAs

Author: Pellauer Michael (Michael Ignatius)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 159-165).The goal of this project is to improve computer architecture by accelerating cycle-accurate performance modeling of multicore processors using FPGAs. Contributions include a distributed technique controlling simulation on a highly-parallel substrate, hardware design techniques to reduce development effort, and a specific framework for modeling shared-memory multicore processors paired with realistic On-Chip Networks.by Michael Pellauer.Ph.D

DSpace@MIT

Trigger and Timing Distributions using the TTC-PON and GBT Bridge Connection in ALICE for the LHC Run 3 Upgrade

Author: Baron Sophie
David Erno
Khan Shuaib Ahmad
Kiss Tivadar
Kluge Alex
Mendez Eduardo
Mitra Jubin
Nayak Tapan
Publication venue: 'Elsevier BV'
Publication date: 04/06/2018
Field of study

The ALICE experiment at CERN is preparing for a major upgrade for the third phase of data taking run (Run 3), when the high luminosity phase of the Large Hadron Collider (LHC) starts. The increase in the beam luminosity will result in high interaction rate causing the data acquisition rate to exceed 3 TB/sec. In order to acquire data for all the events and to handle the increased data rate, a transition in the readout electronics architecture from the triggered to the trigger-less acquisition mode is required. In this new architecture, a dedicated electronics block called the Common Readout Unit (CRU) is defined to act as a nodal communication point for detector data aggregation and as a distribution point for timing, trigger and control (TTC) information. TTC information in the upgraded triggerless readout architecture uses two asynchronous high-speed serial links connections: the TTC-PON and the GBT. We have carried out a study to evaluate the quality of the embedded timing signals forwarded by the CRU to the connected electronics using the TTC-PON and GBT bridge connection. We have used four performance metrics to characterize the communication bridge: (a)the latency added by the firmware logic, (b)the jitter cleaning effect of the PLL on the timing signal, (c)BER analysis for quantitative measurement of signal quality, and (d)the effect of optical transceivers parameter settings on the signal strength. Reliability study of the bridge connection in maintaining the phase consistency of timing signals is conducted by performing multiple iterations of power on/off cycle, firmware upgrade and reset assertion/de-assertion cycle (PFR cycle). The test results are presented and discussed concerning the performance of the TTC-PON and GBT bridge communication chain using the CRU prototype and its compliance with the ALICE timing requirements

arXiv.org e-Print Archive

CERN Document Server

Scalable reconfigurable computing leveraging latency-insensitive channels

Author: Fleming Kermin Elliott, Jr
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 190-197).Traditionally, FPGAs have been confined to the limited role of small, low-volume ASIC replacements and as circuit emulators. However, continued Moore's law scaling has given FPGAs new life as accelerators for applications that map well to fine-grained parallel substrates. Examples of such applications include processor modelling, compression, and digital signal processing. Although FPGAs continue to increase in size, some interesting designs still fail to fit in to a single FPGA. Many tools exist that partition RTL descriptions across FPGAs. Unfortunately, existing tools have low performance due to the inefficiency of maintaining the cycle-by-cycle behavior of RTL among discrete FPGAs. These tools are unsuitable for use in FPGA program acceleration, as the purpose of an accelerator is to make applications run faster. This thesis presents latency-insensitive channels, a language-level mechanism by which programmers express points in their their design at which the cycle-by-cycle behavior of the design may be modified by the compiler. By decoupling the timing of portions of the RTL from the high-level function of the program, designs may be mapped to multiple FPGAs without suffering the performance degradation observed in existing tools. This thesis demonstrates, using a diverse set of large designs, that FPGA programs described in terms of latency-insensitive channels obtain significant gains in design feasibility, compilation time, and run-time when mapped to multiple FPGAs.by Kermin Elliott Fleming, Jr.Ph.D

DSpace@MIT