720 research outputs found

    Experiences Teaching an FPGA-based Embedded Systems Class

    Get PDF
    I describe a two-year-old embedded systems design course I teach at Columbia University. In it, the students learn low-level C programming and VHDL coding to design and implement a project of their own choosing. The students implement their projects using Xilinx FPGAs and tools running on Linux workstations. The main challenges the students face are understanding and complying with complex and often poorly-documented interfaces and protocols, personal time management, and teamwork. While all real-world challenges, this class is often the first time the students encounter them, which makes the class quite challenging, but very practical. In this paper, I describe the structure of the class, the configuration of our teaching laboratory, some of the more successful projects, and give suggestions to instructors wishing to implement the class elsewhere

    Integrating Mode Automata Control Models in SoC Co-Design for Dynamically Reconfigurable FPGAs

    Get PDF
    International audienceThe number of integrated transistors that can be contained on a chip are increasing at an exponential rate, along with rise in targeted sophisticated applications. Thus the design of Systems-on-Chip (SoC) is becoming more and more complex. Hence there is a critical need to find new seamless methodologies and tools to handle the SoC co-design aspects. This paper presents a novel approach for expressing system adaptivity and reconfigurability in Gaspard, a SoC co-design framework, with special focus on partially dynamically reconfigurable FPGAs. The framework is compliant with UML MARTE profile proposed by Object Management Group, for modeling and analysis of realtime embedded systems. The overall objective is to carry out system modeling at a high abstraction level expressed in UML; and afterwards, transform these high level models into detailed enriched lower level models in order to automatically generate the necessary code for final FPGA synthesi

    Real-time Simulation of Dynamic Vehicle Models using a High-performance Reconfigurable Platform

    Get PDF
    A purely software-based approach for Real-Time Simulation (RTS) may have difficulties in meeting real-time constraints for complex physical model simulations. In this paper, we present a methodology for the design and im-plementationofRTS algorithms,basedontheuseof Field-ProgrammableGateArray(FPGA) technologytoimprove the response time of these models. Our methodology utilizes traditional hardware/software co-design approaches to generate a heterogeneous architecture for an FPGA-based simulator. The hardware design was optimized such that it efficiently utilizes the parallel nature of FPGAs and pipelines the independent operations. Further enhancement is obtained through the use of custom accelerators for common non-linear functions. Since the systems we examined had relatively low response time requirements, our approach greatly simplifies the software components by porting the computationally complexregionsto hardware.We illustratethe partitioningofa hardware-based simulator design across dual FPGAs, initiateRTS usinga system input froma Hardware-in-the-Loop (HIL) framework, and use these simulation results from our FPGA-based platform to perform response analysis. The total simulation time, which includes the time required to receive the system input over a socket (without HIL), software initialization, hardware computation, and transferof simulation results backovera socket, showsa speedup of 2Ă— as compared to a simi-lar setup with no hardware acceleration. The correctness of the simulation output from the hardware has also been validated with the simulated results from the software-only design

    A Survey and Evaluation of FPGA High-Level Synthesis Tools

    Get PDF
    High-level synthesis (HLS) is increasingly popular for the design of high-performance and energy-efficient heterogeneous systems, shortening time-to-market and addressing today's system complexity. HLS allows designers to work at a higher-level of abstraction by using a software program to specify the hardware functionality. Additionally, HLS is particularly interesting for designing field-programmable gate array circuits, where hardware implementations can be easily refined and replaced in the target device. Recent years have seen much activity in the HLS research community, with a plethora of HLS tool offerings, from both industry and academia. All these tools may have different input languages, perform different internal optimizations, and produce results of different quality, even for the very same input description. Hence, it is challenging to compare their performance and understand which is the best for the hardware to be implemented. We present a comprehensive analysis of recent HLS tools, as well as overview the areas of active interest in the HLS research community. We also present a first-published methodology to evaluate different HLS tools. We use our methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources

    Broadcast with mask on a Massively Parallel Processing on a Chip

    Get PDF
    workshop drnoc2012The delay of instructions broadcast has a significant impact on the performance of Single Instruction Multiple Data (SIMD) architecture. This is especially true for massively parallel processing Systems-on-Chip (mppSoC), where the processing stage and that of setting up the communication mechanism need several clock periods. Subnetting is the strategy used to partition a single physical network into more than one smaller logical sub-networks (subnets). This technique better controls the broadcast instructions domain and the data traffic between network nodes. Furthermore, it allows to separate synchronous communications from asynchronous processing which maintains reliable communications and rapid processing through parallel processors. This paper describes the design of a communication model called broadcast with mask. This model is dedicated to mppSoC architecture with a huge number of processor elements because it maintains performances even when the number of processors increases. Simulation results and an FPGA implementation validate our approach

    Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

    Get PDF
    The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2

    Maximizing resource utilization by slicing of superscalar architecture

    Full text link
    Superscalar architectural techniques increase instruction throughput from one instruction per cycle to more than one instruction per cycle. Modern processors make use of several processing resources to achieve this kind of throughput. Control units perform various functions to minimize stalls and to ensure a continuous feed of instructions to execution units. It is vital to ensure that instructions ready for execution do not encounter a bottleneck in the execution stage; This thesis work proposes a dynamic scheme to increase efficiency of execution stage by a methodology called block slicing. Implementing this concept in a wide, superscalar pipelined architecture introduces minimal additional hardware and delay in the pipeline. The hardware required for the implementation of the proposed scheme is designed and assessed in terms of cost and delay. Performance measures of speed-up, throughput and efficiency have been evaluated for the resulting pipeline and analyzed

    Minimum area, low cost fpga implementation of aes

    Get PDF
    The Rijndael cipher, designed by Joan Daemen and Vincent Rijmen and recently selected as the official Advanced Encryption Standard (AES) is well suited for hardware use. This implementation can be carried out through several trade-offs between area and speed. This paper presents an 8-bit FPGA implementation of the 128-bit block and 128 bit-key AES cipher. Selected FPGA Family is Altera Flex 10K. The cipher operates at 25 MHz and consumes 470 clock cycles for algorithm encryption, resulting in a throughput of 6.8 Mbps. The design target was optimisation of area and cost.Eje: IV - Workshop de procesamiento distribuido y paraleloRed de Universidades con Carreras en Informática (RedUNCI

    6502 emulator on FPGA

    Get PDF
    6502 microprocessor was once used in almost all of the microcomputer in the 80s, including the Apple II lines of computer, the Commodore PET, the Commodore 64, the Atari 8-bit series and even on the Nintendo Entertainment System (NES) video game console. The objective of this project is to emulate the once famous 6502 microprocessor onto a FPGA chip. The FPGA-based 6502 microprocessor had to emulate the functionality of a real 6502 microprocessor. Accurate pinouts emulation is desired but not a must. The 6502 assembly language is easy to learn and building a computer based on this microprocessor requires very few parts, thus making this project a great experiential learning process. The scope of this project requires the student to have an in-depth understanding on computer system architecture, especially on 6502 architecture; V erilog to understand existing 6502 source code from Bird Computer and also FPGA development process (synthesis tools) to transfer the Verilog code to the FPGA chip. Thus far, the resources and information on 6502 microprocessor looks promising. The student earlier scope was to come up with the 6502 code in Verilog HDL, but as there is available code from Bird Computer (State Machine coded) so the student had chanced his objectives to understand the existing code and implement it on FPGA only. But as along the way, problems occur on hardware implementation, focus had been switched again to simulate the existing code or ALU or simple processor to build up student understanding and for documentation for future project expansion. To test the functionality of the 6502 system, the student will either find existing application or come up with simple program to run using the FPGA-based 6502 system
    • …
    corecore