19 research outputs found

    A cross-platform OpenVX library for FPGA accelerators

    Get PDF
    FPGAs are an excellent platform to implement computer vision applications, since these applications tend to offer a high level of parallelism with many data-independent operations. However, the freedom in the solution design space of FPGAs represents a problem because each solution must be individually designed, verified, and tuned. The emergence of High Level Synthesis (HLS) helps solving this problem and has allowed the implementation of open programming standards as OpenVX for computer vision applications on FPGAs, such as the HiF1ipVX library developed exclusively for Xilinx devices. Although with the HiF1ipVX library, designers can develop solutions efficiently on Xilinx, they do not have an approach to port and run their code on FPGAs from other manufacturers. This work extends the HiFlipVX capabilities in two significant ways: supporting Intel FPGA devices and enabling execution on discrete FPGA accelerators. To provide both without affecting user-facing code, the new carried out implementation combines two HLS programming models: C++, using Intel''s system of tasks, and OpenCL, which provides the CPU interoperability. Comparing with pure OpenCL implementations, this work reduces kernel dispatch resources, saving up to 24% of ALUT resources for each kernel in a graph, and improves performance 2.6 x and energy consumption 1.6 x on average for a set of representative applications, compared with state-of-the-art frameworks

    FPGA-based runtime adaptive multiprocessor approach for embedded high performance computing applications

    No full text
    Embedded high performance computing applications, like for example image processing in surveillance systems, are very compute intensive due to the complexity of the algorithms. Additionally to the computing intensive data processing, the power consumption for such systems needs to be minimized in order to keep them lightweight and mobile operational. One solution for achieving these goals is to exploit hardware parallelism for acceleration purposes on reconfigurable hardware, like Field Programmable Gate Arrays (FPGA). Due to the increase of performance, the clock speed can be reduced, which leads to a reduced power consumption in comparison to traditional processor-based approaches. A challenging task until today is the programming of these devices e.g. with standardized tools or languages like e.g. C. There exist C-to-FPGA tools that ease the programming of these systems, but they do not handle the communication with the environment, e.g. camera interfaces, PCI-interfaces, etc. This still has to be designed in time consuming and handcrafted work. Also the aforementioned tools still have some restriction on the input language. The novel approach in the presented work is to combine processors in a multiprocessor architecture on FPGA for high performance computing applications. This solution combines the flexibility of FPGAs and the high-level programming paradigms of multiprocessor systems and can be seen as a meet-in-the middle solution. This holistic approach is called RAMPSoC (Runtime Adaptive MPSoC) and combines a novel hardware architecture, consisting of heterogeneous processing elements connected over a novel heterogeneous Network-on-Chip, with a user-guided design methodology and a new runtime resource management system

    Star-wheels network-on-chip featuring a self-adaptive mixed topology and a synergy of a circuit - and a packet-switching communication protocol

    No full text
    Multiprocessor System-on-Chip is a promising realization alternative for the next generation of computing architectures providing the required data processing performance in high performance computing applications. Numerous scientists from industry and academic institutions investigate and develop novel processing elements and accelerators as can be seen in real devices like IBM's Cell or nVIDIA's Tesla GPU. Nevertheless, the on-chip communication of these multiple processor elements has to be optimized tailored to the actual requirement of the data to be processed. Network-on-Chip (NoC), Bus-based or even heterogeneous communication on chip often suffer from the fact of being inflexible due to their fixed physical realization. This paper presents a novel approach for a NoC, exploiting circuit-and packed-switched communication as well as a run-time adaptive and heterogeneous topology. An application scenario from image processing exploiting the implemented NoC on an FPGA delivers results like performance data and hardware costs

    A design methodology for application partitioning and architecture development of reconfigurable multiprocessor systems-on-chip

    No full text
    Until today, the efficient partitioning and mapping of applications for multiprocessor systems is a challenging task. The deployment of reconfigurable hardware in this domain helps to meet the application requirements more efficiently due to hardware adaptation at design and runtime, which is not applicable in the traditional multiprocessor domain. To exploit this novel degree of freedom in multiprocessor system-on-chip (MPSoC) technology, a novel design methodology is needed, which helps to hide the complexity of the hardware architecture and its realization alternatives from the developer. This paper shows one approach for such a design methodology for the development of the hardware architecture and the application partitioning and mapping. A novel multistep approach based on hierarchical clustering is used for partitioning of the software application and for configuration of a runtime adaptive multiprocessor system. Furthermore, each application module is then partitioned in a Hardware-Software Codesign process in order to achieve a maximum of performance on the local processors and therefore in general for the MPSoC

    Fast dynamic and partial reconfiguration data path with low hardware overhead on Xilinx FPGAs

    No full text
    Dynamic and partial reconfiguration of Xilinx FPGAs is a well known technique in runtime adaptive system design. With this technique, parts of a configuration can be substituted while other parts stay operative without any disturbance. The advantage is the fact, that the spatial and temporal partitioning can be exploited with the goal to increase performance and to reduce power consumption due to the re-use of chip area. This paper shows a novel methodology for the inclusion of the configuration access port into the data path of a processor core in order to adapt the internal architecture and to re-use this access port as data- sink and source. It is obvious that the chip area, which is utilized by the hardware drivers for the internal configuration access port (ICAP), has to be as small as possible in comparison to the application functionality. Therefore, a hardware design with a small footprint, but with an adequate performance in terms of data throughput, is necessary. This paper presents a fast data path for dynamic and partial reconfiguration data with the advantage of a small footprint on the hardware resources

    Virtualized on-chip distributed computing for heterogeneous reconfigurable multi-core systems

    No full text
    Efficiently managing the parallel execution of various application tasks onto a heterogeneous multi-core system consisting of a combination of processors and accelerators is a difficult task due to the complex system architecture. The management of reconfigurable multi-core systems which exploit dynamic and partial reconfiguration in order to, e.g. increase the number of processing elements to fulfill the performance demands of the application, is even more complicated. This paper presents a special virtualization layer consisting of one central server and several distributed computing clients to virtualize the complex and adaptive heterogeneous multi-core architecture and to autonomously manage the distribution of the parallel computation tasks onto the different processing elements
    corecore