304 research outputs found

    BioThreads: a novel VLIW-based chip multiprocessor for accelerating biomedical image processing applications

    Get PDF
    We discuss BioThreads, a novel, configurable, extensible system-on-chip multiprocessor and its use in accelerating biomedical signal processing applications such as imaging photoplethysmography (IPPG). BioThreads is derived from the LE1 open-source VLIW chip multiprocessor and efficiently handles instruction, data and thread-level parallelism. In addition, it supports a novel mechanism for the dynamic creation, and allocation of software threads to uncommitted processor cores by implementing key POSIX Threads primitives directly in hardware, as custom instructions. In this study, the BioThreads core is used to accelerate the calculation of the oxygen saturation map of living tissue in an experimental setup consisting of a high speed image acquisition system, connected to an FPGA board and to a host system. Results demonstrate near-linear acceleration of the core kernels of the target blood perfusion assessment with increasing number of hardware threads. The BioThreads processor was implemented on both standard-cell and FPGA technologies; in the first case and for an issue width of two, full real-time performance is achieved with 4 cores whereas on a mid-range Xilinx Virtex6 device this is achieved with 10 dual-issue cores. An 8-core LE1 VLIW FPGA prototype of the system achieved 240 times faster execution time than the scalar Microblaze processor demonstrating the scalability of the proposed solution to a state-of-the-art FPGA vendor provided soft CPU core

    Uncertainty Aware Mapping of Embedded Systems for Reliability, Performance, and Energy

    Get PDF
    Due to technology downscaling, embedded systems have increased in complexity and heterogeneity. The increasingly large process, voltage, and temperature variations negatively affect the design and optimization process of these systems. These factors contribute to increased uncertainties that in turn undermine the accuracy and effectiveness of traditional design approaches. In this thesis, we formulate the problem of uncertainty aware mapping for multicore embedded system platforms as a multi-objective optimization problem. We present a solution to this problem that integrates uncertainty models as a new design methodology constructed with Monte Carlo and evolutionary algorithms. The solution is uncertainty aware because it is able to model uncertainties in design parameters and to identify robust design points that limit the influence of these uncertainties onto the objective functions. The proposed design methodology is implemented as a tool that can generate the robust Pareto frontier in the objective space formed by reliability, performance, and energy consumption

    A Hybrid Instruction Prefetching Mechanism for Ultra Low-Power Multicore Clusters

    Get PDF
    The instruction memory hierarchy plays a critical role in performance and energy efficiency of ultralow-power (ULP) processors for the Internet-of-Things (IoT) end-nodes. This is mainly due to the extremely tight power envelope and area budgets, which imply small instruction-caches (I-Cache) operating at very low supply voltages (near-threshold). The challenge is aggravated by the fact that multiple processors, fetching in parallel, require plenty of bandwidth from the I-Caches. In this letter, we propose a low-cost and energy efficient hybrid instruction-prefetching mechanism to be integrated with a ULP multicore cluster. We study its performance for a wide range of IoT applications, from cryptography to computer vision, and show that it can effectively improve the hit-rate of almost all of them to above 95% (average performance improvement of over 2 \times ). In addition, we designed our prefetcher and integrated it in a 4-cores cluster in 28 nm fully-depleted silicon-on-insulator (FDSOI) technology. We show that system's power consumption increases only by about 11% and silicon area by less than 1%. Altogether, a total energy reduction of 1.9x is achieved, thanks to more than 2x performance improvement, enabling a significantly longer battery life

    Hardware Acceleration Using Functional Languages

    Get PDF
    Cílem této práce je prozkoumat možnosti využití funkcionálního paradigmatu pro hardwarovou akceleraci, konkrétně pro datově paralelní úlohy. Úroveň abstrakce tradičních jazyků pro popis hardwaru, jako VHDL a Verilog, přestáví stačit. Pro popis na algoritmické či behaviorální úrovni se rozmáhají jazyky původně navržené pro vývoj softwaru a modelování, jako C/C++, SystemC nebo MATLAB. Funkcionální jazyky se s těmi imperativními nemůžou měřit v rozšířenosti a oblíbenosti mezi programátory, přesto je předčí v mnoha vlastnostech, např. ve verifikovatelnosti, schopnosti zachytit inherentní paralelismus a v kompaktnosti kódu. Pro akceleraci datově paralelních výpočtů se často používají jednotky FPGA, grafické karty (GPU) a vícejádrové procesory. Praktická část této práce rozšiřuje existující knihovnu Accelerate pro počítání na grafických kartách o výstup do VHDL. Accelerate je možno chápat jako doménově specifický jazyk vestavěný do Haskellu s backendem pro prostředí NVIDIA CUDA. Rozšíření pro vysokoúrovňovou syntézu obvodů ve VHDL představené v této práci používá stejný jazyk a frontend.The aim of this thesis is to research how the functional paradigm can be used for hardware acceleration with an emphasis on data-parallel tasks. The level of abstraction of the traditional hardware description languages, such as VHDL or Verilog, is becoming to low. High-level languages from the domains of software development and modeling, such as C/C++, SystemC or MATLAB, are experiencing a boom for hardware description on the algorithmic or behavioral level. Functional Languages are not so commonly used, but they outperform imperative languages in verification, the ability to capture inherent paralellism and the compactness of code. Data-parallel task are often accelerated on FPGAs, GPUs and multicore processors. In this thesis, we use a library for general-purpose GPU programs called Accelerate and extend it to produce VHDL. Accelerate is a domain-specific language embedded into Haskell with a backend for the NVIDIA CUDA platform. We use the language and its frontend, and create a new backend for high-level synthesis of circuits in VHDL.

    Extending ASSERT for HW/SW Co-design

    Get PDF
    Embedded systems are commonly designed by specifying and developing hardware and software systems separately. On the contrary, the hardware/software (HW/SW) co-development exploits the trade-offs between hardware and software in a system through their concurrent design. HW/SW Codevelopment techniques take advantage of the flexibility of system design to create architectures that can meet stringent performance requirements with a shorter design cycle. This paper presents the work done within the scope of ESA HWSWCO (Hardware-Software Co-design) study. The main objective of this study has been to address the HW/SW co-design phase to integrate this engineering task as part of the ASSERT process (refer to [1]) and compatible with the existing ASSERT approach, process and tool, Advances in the automation of the design of HW and SW and the adoption of the Model Driven Architecture (MDA) [9] paradigm make possible the definition of a proper integration substrate and enables the continuous interaction of the HW and SW design paths

    Task-Level Data Model for Hardware Synthesis Based on Concurrent Collections

    Get PDF

    Electronic System-Level Synthesis Methodologies

    Full text link

    System level synthesis of dataflow programs: HEVC decoder case study

    Get PDF
    International audienceWhile dealing with increasing complexity of signal processing algorithms, the primary motivation for the development of High-Level Synthesis (HLS) tools for the automatic generation of Register Transfer Level (RTL) description from high-level description language is the reduction of time-to-market. However, most existing HLS tools operate at the component level, thus the entire system is not taken into consideration. We provide an original technique that raises the level of abstraction to the system level in order to obtain RTL description from a dataflow description. First, we design image processing algorithms using an actor oriented language under the Reconfigurable Video Coding (RVC) standard. Once the design is achieved, we use a dataflow compilation infrastructure called Open RVC-CAL Compiler (Orcc) to generate a C-based code. Afterward, a Xilinx HLS tool called Vivado is used for an automatic generation of synthesizable hardware implementation. In this paper, we show that a simulated hardware code generation of High Efficiency Video Coding (HEVC) under the RVC specifications is rapidly obtained with promising preliminary results
    corecore