1,383 research outputs found

    FASTCUDA: Open Source FPGA Accelerator & Hardware-Software Codesign Toolset for CUDA Kernels

    Get PDF
    Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach

    Embedded electronic systems driven by run-time reconfigurable hardware

    Get PDF
    Abstract This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen Esta tesis doctoral abarca el diseño de sistemas electrĂłnicos embebidos basados en tecnologĂ­a hardware dinĂĄmicamente reconfigurable –disponible a travĂ©s de dispositivos lĂłgicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguraciĂłn que proporcione a la FPGA la capacidad de reconfiguraciĂłn dinĂĄmica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicaciĂłn particionada en tareas multiplexadas en tiempo y en espacio, optimizando asĂ­ su implementaciĂłn fĂ­sica –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estĂĄtico (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalĂșa el flujo de diseño de dicha tecnologĂ­a a travĂ©s del prototipado de varias aplicaciones de ingenierĂ­a (sistemas de control, coprocesadores aritmĂ©ticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotaciĂłn en la industria.Resum Aquesta tesi doctoral estĂ  orientada al disseny de sistemes electrĂČnics empotrats basats en tecnologia hardware dinĂ micament reconfigurable –disponible mitjançant dispositius lĂČgics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguraciĂł que proporcioni a la FPGA la capacitat de reconfiguraciĂł dinĂ mica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicaciĂł particionada en tasques multiplexades en temps i en espai, optimizant aixĂ­ la seva implementaciĂł fĂ­sica –àrea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potĂšncia dissipada– comparada amb altres alternatives basades en hardware estĂ tic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalĂșa el fluxe de disseny d’aquesta tecnologia a travĂ©s del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmĂštics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotaciĂł a la indĂșstria

    Application-aware optimization of Artificial Intelligence for deployment on resource constrained devices

    Get PDF
    Artificial intelligence (AI) is changing people's everyday life. AI techniques such as Deep Neural Networks (DNN) rely on heavy computational models, which are in principle designed to be executed on powerful HW platforms, such as desktop or server environments. However, the increasing need to apply such solutions in people's everyday life has encouraged the research for methods to allow their deployment on embedded, portable and stand-alone devices, such as mobile phones, which exhibit relatively low memory and computational resources. Such methods targets both the development of lightweight AI algorithms and their acceleration through dedicated HW. This thesis focuses on the development of lightweight AI solutions, with attention to deep neural networks, to facilitate their deployment on resource constrained devices. Focusing on the computer vision field, we show how putting together the self learning ability of deep neural networks with application-specific knowledge, in the form of feature engineering, it is possible to dramatically reduce the total memory and computational burden, thus allowing the deployment on edge devices. The proposed approach aims to be complementary to already existing application-independent network compression solutions. In this work three main DNN optimization goals have been considered: increasing speed and accuracy, allowing training at the edge, and allowing execution on a microcontroller. For each of these we deployed the resulting algorithm to the target embedded device and measured its performance

    The CONCERTO methodology for model-based development of avionics SW

    Get PDF
    20th International Conference on Reliable Software Technologies - Ada-Europe 2015 (Ada-Europe 2015), 22 to 26, Jun, 2015, Madrid, Spain.The development of high-integrity real-time systems, including their certification, is a demanding endeavour in terms of time, skills and effort involved. This is particularly true in application domains such as the avionics, where composable design is to be had to allow subdividing monolithic systems into components of smaller complexity, to be outsourced to developers subcontracted down the supply chain. Moreover, the increasing demand for computational power and the consequent interest in multicore HW architectures complicates system deployment. For these reasons, appropriate methodologies and tools need to be devised to help the industrial stakeholders master the overall system design complexity, while keeping manufacturing costs affordable. In this paper we present some elements of the CONCERTO platform, a toolset to support the end-to-end system development process from system modelling to analysis and validation, prior to code generation and deployment. The approach taken by CONCERTO is demonstrated for an illustrative avionics setup, however it is general enough to be applied to a number of industrial domains including the space, telecom and automotive. We finally reason about the benefits to an industrial user by comparing to similar initiatives in the research landscape
    • 

    corecore