232 research outputs found

    Accelerating statistical texture analysis with an FPGA-DSP hybrid architecture

    Get PDF
    Nowadays, most image processing systems are implemented using either MMX-optimized software libraries or, when time requirements are limited, expensive high performance DSP-based boards. In this paper we present a texture analysis co-processor concept that permits the efficient hardware implementation of statistical feature extraction, and hardware-software codesign to achieve high-performance low-cost solutions. We propose a hybrid architecture based on FPGA chips, for massive data processing, and digital signal processor (DSP) for floating-point computations. In our preliminary trials with test images, we achieved sufficient performance improvements to handle a wide range of real-time applications

    Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

    Get PDF
    The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2

    Journal of Real-Time Image Processing manuscript No. (will be inserted by the editor) Evaluation of real-time LBP computing in multiple architectures

    Get PDF
    Abstract Local Binary Pattern (LBP) is a texture operator that is used in several different computer vision applications requiring, in many cases, real-time operation in multiple computing platforms. The irruption of new video standards has increased the typical resolutions and frame rates, which need considerable computational performance. Since LBP is essentially a pixel operator that scales with image size, typical straightforward implementations are usually insufficient to meet these requirements. To identify the solutions that maximize the performance of the real-time LBP extraction, we compare a series different implementations in terms of computational performance and energy efficiency while analyzing the different optimizations that can be made to reach real-time performance on multiple platforms and their different available computing resources. Our contribution addresses the extensive survey of LBP implementations in different platforms that can be found in the literature. To provide for a more complete evaluation, we have implemented the LBP algorithms in several platforms such as Graphics Processing Units, mobile processors and a hybrid programming model image coprocessor. We have extended the evaluation of some of the solutions that can be found in previous work. In addition, we publish the source code of our implementations

    Embedded Machine Learning: Emphasis on Hardware Accelerators and Approximate Computing for Tactile Data Processing

    Get PDF
    Machine Learning (ML) a subset of Artificial Intelligence (AI) is driving the industrial and technological revolution of the present and future. We envision a world with smart devices that are able to mimic human behavior (sense, process, and act) and perform tasks that at one time we thought could only be carried out by humans. The vision is to achieve such a level of intelligence with affordable, power-efficient, and fast hardware platforms. However, embedding machine learning algorithms in many application domains such as the internet of things (IoT), prostheses, robotics, and wearable devices is an ongoing challenge. A challenge that is controlled by the computational complexity of ML algorithms, the performance/availability of hardware platforms, and the application\u2019s budget (power constraint, real-time operation, etc.). In this dissertation, we focus on the design and implementation of efficient ML algorithms to handle the aforementioned challenges. First, we apply Approximate Computing Techniques (ACTs) to reduce the computational complexity of ML algorithms. Then, we design custom Hardware Accelerators to improve the performance of the implementation within a specified budget. Finally, a tactile data processing application is adopted for the validation of the proposed exact and approximate embedded machine learning accelerators. The dissertation starts with the introduction of the various ML algorithms used for tactile data processing. These algorithms are assessed in terms of their computational complexity and the available hardware platforms which could be used for implementation. Afterward, a survey on the existing approximate computing techniques and hardware accelerators design methodologies is presented. Based on the findings of the survey, an approach for applying algorithmic-level ACTs on machine learning algorithms is provided. Then three novel hardware accelerators are proposed: (1) k-Nearest Neighbor (kNN) based on a selection-based sorter, (2) Tensorial Support Vector Machine (TSVM) based on Shallow Neural Networks, and (3) Hybrid Precision Binary Convolution Neural Network (BCNN). The three accelerators offer a real-time classification with monumental reductions in the hardware resources and power consumption compared to existing implementations targeting the same tactile data processing application on FPGA. Moreover, the approximate accelerators maintain a high classification accuracy with a loss of at most 5%

    Image Processing Using FPGAs

    Get PDF
    This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs

    Image and Video Coding Techniques for Ultra-low Latency

    Get PDF
    The next generation of wireless networks fosters the adoption of latency-critical applications such as XR, connected industry, or autonomous driving. This survey gathers implementation aspects of different image and video coding schemes and discusses their tradeoffs. Standardized video coding technologies such as HEVC or VVC provide a high compression ratio, but their enormous complexity sets the scene for alternative approaches like still image, mezzanine, or texture compression in scenarios with tight resource or latency constraints. Regardless of the coding scheme, we found inter-device memory transfers and the lack of sub-frame coding as limitations of current full-system and software-programmable implementations.publishedVersionPeer reviewe

    A Scalable Framework for Monte Carlo Simulation Using FPGA-based Hardware Accelerators with Application to SPECT Imaging A SCALABLE FRAMEWORK FOR MONTE CARLO SIMULATION USING FPGA-BASED HARDWARE ACCELERATORS WITH APPLICATION TO SPECT IMAGING TITLE: A Scal

    Get PDF
    Abstract As the number of transistors that are integrated onto a silicon die continues to increase, the compute power is becoming a commodity. This has enabled a whole host of new applications that rely on high-throughput computations. Recently, the need for faster and cost-effective applications in form-factor constrained environments has driven an interest in on-chip acceleration of algorithms based on Monte Carlo simula- Processor. Futhermore, we have created a framework for further increasing parallelism by scaling our architecture across multiple compute devices and by extending our original design to a multi-FPGA system nearly linear increase in acceleration with logic resources was achieved. iv Acknowledgements One could hardly put into words the contributions made to this work by the many wonderful people who surround me on a daily basis. I count myself blessed to have family, friends and colleagues that support and encourage me and to recognize each individually would be impossible. Nonetheless, there are some people without whose explicit mention this thesis would be incomplete

    Hardware Accelerated Molecular Docking: A Survey

    Get PDF
    corecore