23,934 research outputs found

    An integrated environment for HW/SW co-design based on a CAL specification and HW/SW code generators

    Get PDF
    The possibility of specifying both SW and HW components using the same language is a very attractive design approach. However, despite the efforts spent for implementing such approach using common programming languages such as C and C++, it has not yet shown to be viable and efficient for complex design. The main reason is the difficulty of expressing architectural properties at the level of a common description, difficulty that finally results into the failure of synthesizing efficient HW and efficient parallel SW. This is not the case for dataflow programming that is well- suited to the description of complex real-world computational systems that may contain appreciable amounts of concurrency. A CAL [1, 2] dataflow program is a collection of actors connected by lossless first-in/first-out (FIFO) communication channels, through which they exchange packets of data called tokens. Each actor executes in a sequence of steps, during which it can (1) consume input tokens, (2) produce output tokens, (3) modify its own local state. Actors strictly encapsulate their state, i.e. they cannot directly use or modify the state of other actors. Consequently, executing actors concurrently does not cause race conditions on any state variables, which makes dataflow an excellent (and scalable) parallel programming model. This in turn is key to the generation of efficient HW and SW implementations. This demonstration presents an integrated environment that embeds the synthesis tool that translates a CAL-based dataflow specification into a heterogeneous implementation, composed by HDL and C codes. The demonstration focuses on the capability of the co-design environment to automatically build an executable heterogeneous system implementation running on a platform composed by a processor and a FPGA platform just by annotating the CAL specification. The possibility of direct synthesis from a high level specification is a crucial issue for enabling efficient re-design cycles that include rapid prototyping and validation of performances of the final implementation. The design approach enabled by such integrated environment is particularly suited for development of complex processing systems such as video codecs. As a case study, the demonstration provides the analysis and validation of different SW and HW partitioning of a MPEG-4 Simple Profile decoder

    FASTCUDA: Open Source FPGA Accelerator & Hardware-Software Codesign Toolset for CUDA Kernels

    Get PDF
    Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach

    Design of multimedia processor based on metric computation

    Get PDF
    Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

    Efficient Implementation on Low-Cost SoC-FPGAs of TLSv1.2 Protocol with ECC_AES Support for Secure IoT Coordinators

    Get PDF
    Security management for IoT applications is a critical research field, especially when taking into account the performance variation over the very different IoT devices. In this paper, we present high-performance client/server coordinators on low-cost SoC-FPGA devices for secure IoT data collection. Security is ensured by using the Transport Layer Security (TLS) protocol based on the TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256 cipher suite. The hardware architecture of the proposed coordinators is based on SW/HW co-design, implementing within the hardware accelerator core Elliptic Curve Scalar Multiplication (ECSM), which is the core operation of Elliptic Curve Cryptosystems (ECC). Meanwhile, the control of the overall TLS scheme is performed in software by an ARM Cortex-A9 microprocessor. In fact, the implementation of the ECC accelerator core around an ARM microprocessor allows not only the improvement of ECSM execution but also the performance enhancement of the overall cryptosystem. The integration of the ARM processor enables to exploit the possibility of embedded Linux features for high system flexibility. As a result, the proposed ECC accelerator requires limited area, with only 3395 LUTs on the Zynq device used to perform high-speed, 233-bit ECSMs in 413 µs, with a 50 MHz clock. Moreover, the generation of a 384-bit TLS handshake secret key between client and server coordinators requires 67.5 ms on a low cost Zynq 7Z007S device

    Crypto Embedded System for Electronic Document

    Get PDF
    In this paper, a development of low-cost RSA-based Crypto Embedded System targeted for electronic document security is presented. The RSA algorithm is implemented in a re-configurable hardware, in this case Field Programmable Gate Array (FPGA). The 32-bit soft cores of AlteraÂ’s Nios RISC processor is used as the basic building blocks of the proposed complete embedded solutions. AlteraÂ’s SOPC Builder is used to facilitate the development of crypto embedded system, particularly in hardware/software integration stage. The use of Cryptographic Application Programming Interface (CAPI) to bridge the application and the hardware, and the associated communication layer in the embedded system is also discussed. The result obtained shows that the crypto embedded system provides a suitable compromise between the constraints of speed, space and required security level based on the specific demands of targeted applications
    corecore