372 research outputs found

    Méthodologie de génération de plateforme de prototypage à base de multi-fpga

    Get PDF
    Multi-FPGA based prototyping is no longer optional for hardware/software integration. We can classify multi-FPGA prototyping platforms in three categories: off-the-shelf, custom and cabling. The cabling platform is semi off-the-shelf and semi custom. Nevertheless, crafting a custom and a cabling platform is today a manual process, which is time-consuming. The performance and the cost of the platform lie on the FPGA expertise and SoC DUT knowledge of the engineers. Compared to OTS platforms, the added value, in terms of performance, of cabling or custom platforms can be heavily impaired by an inefficient board design. Moreover, FPGA I/Os are becoming a scarce resource, worsening the inter-FPGA bandwidth generation after generation. Therefore, it becomes more and more difficult to prototype an SoC/ASIC design at proper performance. The contributions of the manuscript are: (1). An automatic implementation flow for an OTS platform is proposed. (2). An automatic design flow for creating a custom platform is proposed, thus increasing the productivity, enabling the board exploration, and optimizing cost and performance. (3). The cabling platform is proposed where one board is composed of one FPGA and several connectors, with an algorithm to automatically find a solution for the cable distribution. (4). Thanks to the developed automatic tools, the three different multi-FPGA platforms are compared. The custom platform always achieves better performance and lower deployment cost, but still with 3-5 months in time of availability. If the performance or the deployment cost are not rigorous constraints, the cabling platform offers an attractive alternative compared to others.Face Ă  la difficultĂ© de l’intĂ©gration matĂ©riel/logiciel, le prototypage Ă  base de multi-FPGA devient obligatoire dans la vĂ©rification prĂ©-silicium. Les plateformes de prototypage peuvent ĂȘtre classĂ©es en trois catĂ©gories: OTS, sur mesure et cĂąblĂ©es. La plateforme cĂąblĂ©e est semi OTS et semi sur mesure. NĂ©anmoins, la crĂ©ation d’une plateforme sur mesure et cĂąblĂ©e est un processus manuel et chronophage. La performance et le coĂ»t de la plateforme dĂ©pend de l'expĂ©rience de concepteurs en expertise de FPGA et connaissance du systĂšme sur puce. Par rapport Ă  des plateformes OTS, la valeur ajoutĂ©e, en terme de performance, des plateformes cĂąblĂ©es ou sur mesure peuvent ĂȘtre fortement dĂ©gradĂ©e par une carte inefficace. En plus, FPGA E/S devient une ressource rare, aggravant la bande passante inter-FPGA. Par consĂ©quent, il devient de plus en plus difficile de prototyper un design Ă  une performance satisfaisante. Les contributions sont: (1). Un flot de implĂ©mentation automatique pour une plateforme OTS. (2). Un flot de conception automatique pour crĂ©er une plateforme sur mesure, ainsi augmentant la productivitĂ©, permettant l’exploration de carte et optimisant le coĂ»t et la performance. (3). La plateforme cĂąblĂ©e avec un algorithme permettant automatiquement de trouver une solution pour la distribution des cĂąbles. (4). GrĂące aux flots automatique, les trois plateformes sont comparĂ©es. La plateforme sur mesure toujours rĂ©alise plus de performance et moins de coĂ»t de dĂ©ploiement, mais encore avec 3-5 mois en temps de disponibilitĂ©. Si la performance ou le coĂ»t de dĂ©ploiement ne sont pas les contraintes strictes, la plateforme cĂąblĂ©e est une alternative intĂ©ressante par rapport aux autres

    Automated Hardware Prototyping for 3D Network on Chips

    Get PDF
    Vor mehr als 50 Jahren stellte IntelÂź MitbegrĂŒnder Gordon Moore eine Prognose zum Entwicklungsprozess der Transistortechnologie auf. Er prognostizierte, dass sich die Zahl der Transistoren in integrierten Schaltungen alle zwei Jahre verdoppeln wird. Seine Aussage ist immer noch gĂŒltig, aber ein Ende von Moores Gesetz ist in Sicht. Mit dem Ende von Moore’s Gesetz mĂŒssen neue Aspekte untersucht werden, um weiterhin die Leistung von integrierten Schaltungen zu steigern. Zwei mögliche AnsĂ€tze fĂŒr "More than Moore” sind 3D-Integrationsverfahren und heterogene Systeme. Gleichzeitig entwickelt sich ein Trend hin zu Multi-Core Prozessoren, basierend auf Networks on chips (NoCs). Neben dem Ende des Mooreschen Gesetzes ergeben sich bei immer kleiner werdenden TechnologiegrĂ¶ĂŸen, vor allem jenseits der 60 nm, neue Herausforderungen. Eine Schwierigkeit ist die WĂ€rmeableitung in großskalierten integrierten Schaltkreisen und die daraus resultierende Überhitzung des Chips. Um diesem Problem in modernen Multi-Core Architekturen zu begegnen, muss auch die Verlustleistung der Netzwerkressourcen stark reduziert werden. Diese Arbeit umfasst eine durch Hardware gesteuerte Kombination aus Frequenzskalierung und Power Gating fĂŒr 3D On-Chip Netzwerke, einschließlich eines FPGA Prototypen. DafĂŒr wurde ein Takt-synchrones 2D Netzwerk auf ein dreidimensionales asynchrones Netzwerk mit mehreren Frequenzbereichen erweitert. ZusĂ€tzlich wurde ein skalierbares Online-Power-Management System mit geringem Ressourcenaufwand entwickelt. Die Verifikation neuer Hardwarekomponenten ist einer der zeitaufwendigsten Schritte im Entwicklungsprozess hochintegrierter digitaler Schaltkreise. Um diese Aufgabe zu beschleunigen und um eine parallele Softwareentwicklung zu ermöglichen, wurde im Rahmen dieser Arbeit ein automatisiertes und benutzerfreundliches Tool fĂŒr den Entwurf neuer Hardware Projekte entwickelt. Eine grafische BenutzeroberflĂ€che zum Erstellen des gesamten Designablaufs, vom Erstellen der Architektur, Parameter Deklaration, Simulation, Synthese und Test ist Teil dieses Werkzeugs. Zudem stellt die GrĂ¶ĂŸe der Architektur fĂŒr die Erstellung eines Prototypen eine besondere Herausforderung dar. FrĂŒhere Arbeiten haben es versĂ€umt, eine schnelles und unkompliziertes Prototyping, insbesondere von Architekturen mit mehr als 50 Prozessorkernen, zu realisieren. Diese Arbeit umfasst eine Design Space Exploration und FPGA-basierte Prototypen von verschiedenen 3D-NoC Implementierungen mit mehr als 80 Prozessoren

    A Scalable and Adaptive Network on Chip for Many-Core Architectures

    Get PDF
    In this work, a scalable network on chip (NoC) for future many-core architectures is proposed and investigated. It supports different QoS mechanisms to ensure predictable communication. Self-optimization is introduced to adapt the energy footprint and the performance of the network to the communication requirements. A fault tolerance concept allows to deal with permanent errors. Moreover, a template-based automated evaluation and design methodology and a synthesis flow for NoCs is introduced

    A study on hardware design for high performance artificial neural network by using FPGA and NoC

    Get PDF
    戶ćșŠ:新 ; 栱摊ç•Șć·:ç”Č3421ć· ; ć­ŠäœăźçšźéĄž:ćšćŁ«(ć·„ć­Š) ; 授䞎ćčŽæœˆæ—„:2011/9/15 ; æ—©ć€§ć­Šäœèš˜ç•Șć·:新574

    Self-timed field programmmable gate array architectures

    Get PDF

    Cycle-accurate multicore performance models on FPGAs

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 159-165).The goal of this project is to improve computer architecture by accelerating cycle-accurate performance modeling of multicore processors using FPGAs. Contributions include a distributed technique controlling simulation on a highly-parallel substrate, hardware design techniques to reduce development effort, and a specific framework for modeling shared-memory multicore processors paired with realistic On-Chip Networks.by Michael Pellauer.Ph.D

    Trigger and Timing Distributions using the TTC-PON and GBT Bridge Connection in ALICE for the LHC Run 3 Upgrade

    Full text link
    The ALICE experiment at CERN is preparing for a major upgrade for the third phase of data taking run (Run 3), when the high luminosity phase of the Large Hadron Collider (LHC) starts. The increase in the beam luminosity will result in high interaction rate causing the data acquisition rate to exceed 3 TB/sec. In order to acquire data for all the events and to handle the increased data rate, a transition in the readout electronics architecture from the triggered to the trigger-less acquisition mode is required. In this new architecture, a dedicated electronics block called the Common Readout Unit (CRU) is defined to act as a nodal communication point for detector data aggregation and as a distribution point for timing, trigger and control (TTC) information. TTC information in the upgraded triggerless readout architecture uses two asynchronous high-speed serial links connections: the TTC-PON and the GBT. We have carried out a study to evaluate the quality of the embedded timing signals forwarded by the CRU to the connected electronics using the TTC-PON and GBT bridge connection. We have used four performance metrics to characterize the communication bridge: (a)the latency added by the firmware logic, (b)the jitter cleaning effect of the PLL on the timing signal, (c)BER analysis for quantitative measurement of signal quality, and (d)the effect of optical transceivers parameter settings on the signal strength. Reliability study of the bridge connection in maintaining the phase consistency of timing signals is conducted by performing multiple iterations of power on/off cycle, firmware upgrade and reset assertion/de-assertion cycle (PFR cycle). The test results are presented and discussed concerning the performance of the TTC-PON and GBT bridge communication chain using the CRU prototype and its compliance with the ALICE timing requirements

    Scalable reconfigurable computing leveraging latency-insensitive channels

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 190-197).Traditionally, FPGAs have been confined to the limited role of small, low-volume ASIC replacements and as circuit emulators. However, continued Moore's law scaling has given FPGAs new life as accelerators for applications that map well to fine-grained parallel substrates. Examples of such applications include processor modelling, compression, and digital signal processing. Although FPGAs continue to increase in size, some interesting designs still fail to fit in to a single FPGA. Many tools exist that partition RTL descriptions across FPGAs. Unfortunately, existing tools have low performance due to the inefficiency of maintaining the cycle-by-cycle behavior of RTL among discrete FPGAs. These tools are unsuitable for use in FPGA program acceleration, as the purpose of an accelerator is to make applications run faster. This thesis presents latency-insensitive channels, a language-level mechanism by which programmers express points in their their design at which the cycle-by-cycle behavior of the design may be modified by the compiler. By decoupling the timing of portions of the RTL from the high-level function of the program, designs may be mapped to multiple FPGAs without suffering the performance degradation observed in existing tools. This thesis demonstrates, using a diverse set of large designs, that FPGA programs described in terms of latency-insensitive channels obtain significant gains in design feasibility, compilation time, and run-time when mapped to multiple FPGAs.by Kermin Elliott Fleming, Jr.Ph.D
    • 

    corecore