Search CORE

1,068 research outputs found

Design of multimedia processor based on metric computation

Author: Balasa
Berekovic
Jean Luc Philippe
Jean Philippe Diguet
Mohamed Abid
Nader Ben Amor
Suzuki
Wuytack
Yannick Le Moullec
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

arXiv.org e-Print Archive

Crossref

HAL-Université de Bretagne Occidentale

VBN

Heterogeneous parallel virtual machine: A portable program representation and compiler for performance and energy optimizations on heterogeneous parallel systems

Author: Kotsifakou Maria
Publication venue
Publication date: 01/08/2020
Field of study

Programming heterogeneous parallel systems, such as the SoCs (System-on-Chip) on mobile and edge devices is extremely difficult; the diverse parallel hardware they contain exposes vastly different hardware instruction sets, parallelism models and memory systems. Moreover, a wide range of diverse hardware and software approximation techniques are available for applications targeting heterogeneous SoCs, further exacerbating the programmability challenges. In this thesis, we alleviate the programmability challenges of such systems using flexible compiler intermediate representation solutions, in order to benefit from the performance and superior energy efficiency of heterogeneous systems. First, we develop Heterogeneous Parallel Virtual Machine (HPVM), a parallel program representation for heterogeneous systems, designed to enable functional and performance portability across popular parallel hardware. HPVM is based on a hierarchical dataflow graph with side effects. HPVM successfully supports three important capabilities for programming heterogeneous systems: a compiler intermediate representation (IR), a virtual instruction set (ISA), and a basis for runtime scheduling. We use the HPVM representation to implement an HPVM prototype, defining the HPVM IR as an extension of the Low Level Virtual Machine (LLVM) IR. Our results show comparable performance with optimized OpenCL kernels for the target hardware from a single HPVM representation using translators from HPVM virtual ISA to native code, IR optimizations operating directly on the HPVM representation, and the capability for supporting flexible runtime scheduling schemes from a single HPVM representation. We extend HPVM to ApproxHPVM, introducing hardware-independent approximation metrics in the IR to enable maintaining accuracy information at the IR level and mapping of application-level end-to-end quality metrics to system level "knobs". The approximation metrics quantify the acceptable accuracy loss for individual computations. Application programmers only need to specify high-level, and end-to-end, quality metrics, instead of detailed parameters for individual approximation methods. The ApproxHPVM system then automatically tunes the accuracy requirements of individual computations and maps them to approximate hardware when possible. ApproxHPVM results show significant performance and energy improvements for popular deep learning benchmarks. Finally, we extend to ApproxHPVM to ApproxTuner, a compiler and runtime system for approximation. ApproxTuner extends ApproxHPVM with a wide range of hardware and software approximation techniques. It uses a three step approximation tuning strategy, a combination of development-time, install-time, and dynamic tuning. Our strategy ensures software portability, even though approximations have highly hardware-dependent performance, and enables efficient dynamic approximation tuning despite the expensive offline steps. ApproxTuner results show significant performance and energy improvements across 7 Deep Neural Networks and 3 image processing benchmarks, and ensures that high-level end-to-end quality specifications are satisfied during adaptive approximation tuning

Illinois Digital Environment for Access to Learning and Scholarship Repository

Applications in Electronics Pervading Industry, Environment and Society

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

This book features the manuscripts accepted for the Special Issue “Applications in Electronics Pervading Industry, Environment and Society—Sensing Systems and Pervasive Intelligence” of the MDPI journal Sensors. Most of the papers come from a selection of the best papers of the 2019 edition of the “Applications in Electronics Pervading Industry, Environment and Society” (APPLEPIES) Conference, which was held in November 2019. All these papers have been significantly enhanced with novel experimental results. The papers give an overview of the trends in research and development activities concerning the pervasive application of electronics in industry, the environment, and society. The focus of these papers is on cyber physical systems (CPS), with research proposals for new sensor acquisition and ADC (analog to digital converter) methods, high-speed communication systems, cybersecurity, big data management, and data processing including emerging machine learning techniques. Physical implementation aspects are discussed as well as the trade-off found between functional performance and hardware/system costs

Directory of Open Access Books (DOAB)

Exploring New Computing Paradigms for Data-Intensive Applications

Author: Santoro Giulia
Publication venue: Politecnico di Torino
Publication date
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Neuromorphic deep convolutional neural network learning systems for FPGA in real time

Author: Tapiador Morales Ricardo
Publication venue
Publication date: 13/12/2019
Field of study

Deep Learning algorithms have become one of the best approaches for pattern recognition in several fields, including computer vision, speech recognition, natural language processing, and audio recognition, among others. In image vision, convolutional neural networks stand out, due to their relatively simple supervised training and their efficiency extracting features from a scene. Nowadays, there exist several implementations of convolutional neural networks accelerators that manage to perform these networks in real time. However, the number of operations and power consumption of these implementations can be reduced using a different processing paradigm as neuromorphic engineering. Neuromorphic engineering field studies the behavior of biological and inner systems of the human neural processing with the purpose of design analog, digital or mixed-signal systems to solve problems inspired in how human brain performs complex tasks, replicating the behavior and properties of biological neurons. Neuromorphic engineering tries to give an answer to how our brain is capable to learn and perform complex tasks with high efficiency under the paradigm of spike-based computation. This thesis explores both frame-based and spike-based processing paradigms for the development of hardware architectures for visual pattern recognition based on convolutional neural networks. In this work, two FPGA implementations of convolutional neural networks accelerator architectures for frame-based using OpenCL and SoC technologies are presented. Followed by a novel neuromorphic convolution processor for spike-based processing paradigm, which implements the same behaviour of leaky integrate-and-fire neuron model. Furthermore, it reads the data in rows being able to perform multiple layers in the same chip. Finally, a novel FPGA implementation of Hierarchy of Time Surfaces algorithm and a new memory model for spike-based systems are proposed

idUS. Depósito de Investigación Universidad de Sevilla

Image Processing Using FPGAs

Author: Bailey Donald
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs

Directory of Open Access Books (DOAB)

Recommended from our members

Implementation of the OPU Instruction Set Architecture on the Microsemi Polarfire 300 Field-Programmable Gate Array

Author: Delhez Louis Jean Eric
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Deep learning is a fast-growing field with numerous promising applications that, unfortunately, demands large computing power for both training and inference tasks. To meet this demand, numerous hardware accelerators have thus been designed. Currently, however, these platforms are being developed independently from each other, and, as a result, there is a lack of compatibility between them. Notably, there is a need for standardization of the interface between hardware accelerators and software. UCLA's OPU is an ISA that aims at solving this issue. Contrary to general-purpose ISAs, OPU is designed to adequately express the computations involved in deep learning models, which allows for simple compilation and efficient cores. Prior to this work, only two fully-featured cores implementing the OPU ISA had been designed, both targeted at Xilinx SRAM-based FPGAs. However, flash-based FPGAs can offer several advantages thanks to their different technology. They are more secure, more reliable, and can yield a lower power consumption. All three of these characteristics being potentially highly valuable for deep learning accelerators, especially those embedded in edge devices, a new OPU core is here developed and mapped to a flash-based FPGA. More specifically, the potential of the MPF300 FPGA as a platform for the OPU ISA is evaluated. This represents the first OPU core implemented on an FPGA that is not manufactured by Xilinx. In addition, this design is also the first OPU core capable of operating on floating-point numbers, which simplifies the compilation of models. As such, this work contributes to the diversification of the catalog of available OPU cores, which increases the relevance of this ISA.While prior work affirms that, on Xilinx FPGAs, 8-bit floating-point arithmetic is more area-efficient than 8-bit integer arithmetic, the opposite is found in this work for Microsemi FPGAs. As a consequence, it is established that the optimum manner to perform large floating-point dot products on the MPF300 is to convert the operands to wider integers, on the device, then complete the computations using integer arithmetic. In contrast to Xilinx FPGAs, 5-bit mantissas are here preferred over 4-bit mantissas. Additionally, due to the lower ratio of the number of LUTs to DSPs of the MPF300, the relative resource utilization is found to be significantly higher here compared to the existing implementations. This new OPU core is found to be in average 1.7 times more energy-efficient than the existing similarly-sized implementation of the OPU ISA. Furthermore, the new core is in average 2 times faster than the Nvidia Jetson Nano platform, while consuming the same amount of power. These results further prove the relevance of the OPU ISA. In addition, this demonstrates that flash-based FPGAs, too, are a viable option for deep learning acceleration. The scarcity of these FPGAs in the relevant literature is thus not justified. Nevertheless, analysis of the core shows that the layout of modern FPGAs is in general suboptimal for the task of machine learning acceleration. In particular, the placement of the hard resources of the device tends to cause congestion on the device that reduces performance. This suggests the need for the development of specialized FPGAs for this task

eScholarship - University of California

Advanced photonic and electronic systems WILGA 2018

Author: Romaniuk Ryszard S.
Publication venue: Electronics and Telecommunications Committee
Publication date: 01/01/2017
Field of study

WILGA annual symposium on advanced photonic and electronic systems has been organized by young scientist for young scientists since two decades. It traditionally gathers around 400 young researchers and their tutors. Ph.D students and graduates present their recent achievements during well attended oral sessions. Wilga is a very good digest of Ph.D. works carried out at technical universities in electronics and photonics, as well as information sciences throughout Poland and some neighboring countries. Publishing patronage over Wilga keep Elektronika technical journal by SEP, IJET and Proceedings of SPIE. The latter world editorial series publishes annually more than 200 papers from Wilga. Wilga 2018 was the XLII edition of this meeting. The following topical tracks were distinguished: photonics, electronics, information technologies and system research. The article is a digest of some chosen works presented during Wilga 2018 symposium. WILGA 2017 works were published in Proc. SPIE vol.10445. WILGA 2018 works were published in Proc. SPIE vol.10808

Biblioteka Nauki - repozytorium artykuÅÃ³w

International Journal of Electronics and Telecommunications (Warsaw University of Technology)