147 research outputs found

    High-Level Synthesis Hardware Design for FPGA-Based Accelerators: Models, Methodologies, and Frameworks

    Get PDF
    Hardware accelerators based on field programmable gate array (FPGA) and system on chip (SoC) devices have gained attention in recent years. One of the main reasons is that these devices contain reconfigurable logic, which makes them feasible for boosting the performance of applications. High-level synthesis (HLS) tools facilitate the creation of FPGA code from a high level of abstraction using different directives to obtain an optimized hardware design based on performance metrics. However, the complexity of the design space depends on different factors such as the number of directives used in the source code, the available resources in the device, and the clock frequency. Design space exploration (DSE) techniques comprise the evaluation of multiple implementations with different combinations of directives to obtain a design with a good compromise between different metrics. This paper presents a survey of models, methodologies, and frameworks proposed for metric estimation, FPGA-based DSE, and power consumption estimation on FPGA/SoC. The main features, limitations, and trade-offs of these approaches are described. We also present the integration of existing models and frameworks in diverse research areas and identify the different challenges to be addressed

    Real-time implementation of 3D LiDAR point cloud semantic segmentation in an FPGA

    Get PDF
    Dissertação de mestrado em Informatics EngineeringIn the last few years, the automotive industry has relied heavily on deep learning applications for perception solutions. With data-heavy sensors, such as LiDAR, becoming a standard, the task of developing low-power and real-time applications has become increasingly more challenging. To obtain the maximum computational efficiency, no longer can one focus solely on the software aspect of such applications, while disregarding the underlying hardware. In this thesis, a hardware-software co-design approach is used to implement an inference application leveraging the SqueezeSegV3, a LiDAR-based convolutional neural network, on the Versal ACAP VCK190 FPGA. Automotive requirements carefully drive the development of the proposed solution, with real-time performance and low power consumption being the target metrics. A first experiment validates the suitability of Xilinx’s Vitis-AI tool for the deployment of deep convolutional neural networks on FPGAs. Both the ResNet-18 and SqueezeNet neural networks are deployed to the Zynq UltraScale+ MPSoC ZCU104 and Versal ACAP VCK190 FPGAs. The results show that both networks achieve far more than the real-time requirements while consuming low power. Compared to an NVIDIA RTX 3090 GPU, the performance per watt during both network’s inference is 12x and 47.8x higher and 15.1x and 26.6x higher respectively for the Zynq UltraScale+ MPSoC ZCU104 and the Versal ACAP VCK190 FPGA. These results are obtained with no drop in accuracy in the quantization step. A second experiment builds upon the results of the first by deploying a real-time application containing the SqueezeSegV3 model using the Semantic-KITTI dataset. A framerate of 11 Hz is achieved with a peak power consumption of 78 Watts. The quantization step results in a minimal accuracy and IoU degradation of 0.7 and 1.5 points respectively. A smaller version of the same model is also deployed achieving a framerate of 19 Hz and a peak power consumption of 76 Watts. The application performs semantic segmentation over all the point cloud with a field of view of 360°.Nos últimos anos a indústria automóvel tem cada vez mais aplicado deep learning para solucionar problemas de perceção. Dado que os sensores que produzem grandes quantidades de dados, como o LiDAR, se têm tornado standard, a tarefa de desenvolver aplicações de baixo consumo energético e com capacidades de reagir em tempo real tem-se tornado cada vez mais desafiante. Para obter a máxima eficiência computacional, deixou de ser possível focar-se apenas no software aquando do desenvolvimento de uma aplicação deixando de lado o hardware subjacente. Nesta tese, uma abordagem de desenvolvimento simultâneo de hardware e software é usada para implementar uma aplicação de inferência usando o SqueezeSegV3, uma rede neuronal convolucional profunda, na FPGA Versal ACAP VCK190. São os requisitos automotive que guiam o desenvolvimento da solução proposta, sendo a performance em tempo real e o baixo consumo energético, as métricas alvo principais. Uma primeira experiência valida a aptidão da ferramenta Vitis-AI para a implantação de redes neuronais convolucionais profundas em FPGAs. As redes ResNet-18 e SqueezeNet são ambas implantadas nas FPGAs Zynq UltraScale+ MPSoC ZCU104 e Versal ACAP VCK190. Os resultados mostram que ambas as redes ultrapassam os requisitos de tempo real consumindo pouca energia. Comparado com a GPU NVIDIA RTX 3090, a performance por Watt durante a inferência de ambas as redes é superior em 12x e 47.8x e 15.1x e 26.6x respetivamente na Zynq UltraScale+ MPSoC ZCU104 e na Versal ACAP VCK190. Estes resultados foram obtidos sem qualquer perda de accuracy na etapa de quantização. Uma segunda experiência é feita no seguimento dos resultados da primeira, implantando uma aplicação de inferência em tempo real contendo o modelo SqueezeSegV3 e usando o conjunto de dados Semantic-KITTI. Um framerate de 11 Hz é atingido com um pico de consumo energético de 78 Watts. O processo de quantização resulta numa perda mínima de accuracy e IoU com valores de 0.7 e 1.5 pontos respetivamente. Uma versão mais pequena do mesmo modelo é também implantada, atingindo uma framerate de 19 Hz e um pico de consumo energético de 76 Watts. A aplicação desenvolvida executa segmentação semântica sobre a totalidade das nuvens de pontos LiDAR, com um campo de visão de 360°

    Optimierung der Energie und Power getriebenen Architekturexploration für Multicore und heterogenes System on Chip

    Get PDF
    The contribution of this work builds on top of the established virtual prototype platforms to improve both SoC design quality and productivity. Initially, an automatic system-level power estimation framework was developed to address the critical issue of early power estimation in SoC design. The estimation framework models the static and dynamic power consumption of the hardware components. These models are created from the normalized values of the basic design components of SoC, obtained through one-time power simulation of RTL hardware models. The framework allows dynamic technology node reconfiguration for power estimation models. Its instantaneous power reporting aids the detection of possible hotspot early into the design process. Adding this additional data in conjunction with a steadily growing design space of complex heterogeneous SoC, finding the right parameter configuration is a challenging and laborious task for a system-level designer. This work addresses this bottleneck by optimizing the design space exploration (DSE) process for MPSoC design. An automatic DSE framework for virtual platforms (VPs) was developed which is flexible and allows the selection optimal parameter configuration without pre-existing knowledge. To reduce exploration time, the framework is equipped with several multi-objective optimization techniques based on simulated annealing and a genetic algorithm. Lastly, to aid HW/SW partitioning at system-level, a flexible and automated workflow (SW2TLM) is presented. It allows the designer to explore various possible partitioning scenarios without going into depth of the hardware architecture complexity and software integration. The framework generates system-level hardware accelerators from corresponding functionality encoded in the software code and integrates them into the VP. Power consumption and time speedups of acceleration is reported to the designer, which further increases the quality and productivity of the development process towards the final architecture. The presented tools are evaluated using a state-of-the-art VP for a range of single and multi-core applications. Viewing the energy delay product, a reduction in exploration time was recorded at approximately 62% (worst case), maintaining optimal parameter accuracy of 90% compared to previous techniques. While the SW2TLM further increases the exploration versatility by combining modern high-level synthesis with system-level architectural exploration.Der Beitrag dieser Arbeit baut auf dem etablierten Konzept der virtuellen Prototyp (VP) Plattformen auf, um die Qualität und die Produktivität des Entwurfsprozesses zu verbessern. Zunächst wurde ein automatisches System-Level-Framework entwickelt, um Verlustleistungsabschätzung für SoC-Designs in einer deutlich früheren Entwicklungsphase zu ermöglichen. Hierfür werden statischen und dynamischen Energieverbrauchsanteile individueller Hardwareelemente durch ein abstraktes Modell ausgedrückt. Das Framework ermöglicht eine dynamische Anpassung des Technologieknotens sowie die Integration neuer Leistungsmodelle für Drittanbieterkomponenten. Die kontinuierliche Erfassung der Energieverbrauchseigenschaften und ihre grafische Darstellung Benutzeroberfläche unterstützt zusätzlich die frühzeitige Identifikation möglicher Hotspots. Durch die Bereitstellung zusätzlicher Daten, in Verbindung mit einem stetig wachsenden Entwurfsraum komplexer SoCs, ist die Identifikation der richtigen Parameterkonfiguration eine zeitintensive Aufgabe. Die vorgelegten Konzepte erlauben eine gesteigerte Automatisierung des Explorationsprozesses. Techniken der mehrdimensionalen Optimierung, basierend auf Simulated Annealing und genetischer Algorithmen erlauben die Identifikation von geeigneten Konfigurationen ohne vorheriges Wissen oder Erfahrungswerte Schließlich wurde zur Unterstützung der HW/SW -Partitionierung auf System-Ebene ein flexibler und automatisierter Workflow entwickelt. Er ermöglicht es dem Designer verschiedene mögliche Partitionierungsszenarien zu untersuchen, ohne sich in die Komplexität der Hardwarearchitektur und der Softwareintegration zu vertiefen. Das Framework erzeugt abstrakte Beschleunigermodelle aus entsprechenden Softwarefunktionen und integriert sie nahtlos in den ausführbare VP. Detaillierte Daten zum Energieverbrauch, Beschleunigungsfaktor und Kommunikationsoverhead der Partitionierung werden erfasst und dem Designer zur Verfügung gestellt, was die Qualität und Produktivität des weiter erhöht. Die vorgestellten Tools werden mit einer modernen VP für verschiedene SW-Anwendungen evaluiert. Bei Betrachtung des Energieverzögerungsprodukts wurde eine Verringerung der Explorationszeit um mehr als 62% bei 90% Parametergenauigkeit festgestell. Darauf aufbauend, erleichtert die automatisierte Untersuchung verschiedener HW/SW Partitionierungen die Entwicklung heterogener Architekturen durch die Kombination moderner HLS mit Architektur-Exploration auf der Systemebene

    Temperature-Aware Design and Management for 3D Multi-Core Architectures

    Get PDF
    Vertically-integrated 3D multiprocessors systems-on-chip (3D MPSoCs) provide the means to continue integrating more functionality within a unit area while enhancing manufacturing yields and runtime performance. However, 3D MPSoCs incur amplified thermal challenges that undermine the corresponding reliability. To address these issues, several advanced cooling technologies, alongside temperature-aware design-time optimizations and run-time management schemes have been proposed. In this monograph, we provide an overall survey on the recent advances in temperature-aware 3D MPSoC considerations. We explore the recent advanced cooling strategies, thermal modeling frameworks, design-time optimizations and run-time thermal management schemes that are primarily targeted for 3D MPSoCs. Our aim of proposing this survey is to provide a global perspective, highlighting the advancements and drawbacks on the recent state-of-the-ar

    Training & acceleration of deep reinforcement learning agents

    Get PDF
    Εθνικό Μετσόβιο Πολυτεχνείο--Μεταπτυχιακή Εργασία. Διεπιστημονικό-Διατμηματικό Πρόγραμμα Μεταπτυχιακών Σπουδών (Δ.Π.Μ.Σ.) "Επιστήμη Δεδομένων και Μηχανική Μάθηση
    corecore