80 research outputs found

    A Many-Core Overlay for High-Performance Embedded Computing on FPGAs

    Get PDF
    In this work, we propose a configurable many-core overlay for high-performance embedded computing. The size of internal memory, supported operations and number of ports can be configured independently for each core of the overlay. The overlay was evaluated with matrix multiplication, LU decomposition and Fast-Fourier Transform (FFT) on a ZYNQ-7020 FPGA platform. The results show that using a system-level many-core overlay avoids complex hardware design and still provides good performance results.Comment: Presented at First International Workshop on FPGAs for Software Programmers (FSP 2014) (arXiv:1408.4423

    Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications

    Get PDF
    The fifth-generation (5G) revolution represents more than a mere performance enhancement of previous generations: it will deeply transform the way humans and/or machines interact, enabling a heterogeneous expansion in the number of use cases and services. Crucial to the realization of this revolution is the design of hardware components characterized by high degrees of flexibility, versatility and resource/power efficiency. This chapter proposes a field-programmable gate array (FPGA)-oriented baseband processing architecture suitable for fast-changing communication environments such as 4G/5G waveform coexistence, noncontiguous carrier aggregation (CA) or centralized cloud radio access network (C-RAN) processing. The proposed architecture supports three 5G waveform candidates and is shown to be upgradable, resource-efficient and cost-effective. Through hardware virtualization, enabled by dynamic partial reconfiguration (DPR), the design space exploration of our architecture exceeds the hardware resources available on the Zynq xc7z020 device. Moreover, dynamic frequency scaling (DFS) enables the runtime adjustment of processing throughput and power reductions by up to 88%. The combined resource overhead for DPR and DFS is very low, and the reconfiguration latency stays two orders of magnitude below the control plane latency requirements proposed for 5G communications

    Onboard processing of synthetic aperture radar backprojection algorithm in FPGA

    Get PDF
    Synthetic aperture radar is a microwave technique to extracting image information of the target. Electromagnetic waves that are reflected from the target are acquired by the aircraft or satellite receivers and sent to a ground station to be processed by applying computational demanding algorithms. Radar data streams are acquired by an aircraft or satellite and sent to a ground station to be processed in order to extract images from the data since these processing algorithms are computationally demanding. However, novel applications require real-time processing for real-time analysis and decisions and so onboard processing is necessary. Running computationally demanding algorithms on onboard embedded systems with limited energy and computational capacity is a challenge. This article proposes a configurable hardware core for the execution of the backprojection algorithm with high performance and energy efficiency. The original backprojection algorithm is restructured to expose computational parallelism and then optimized by replacing floating-point with fixed-point arithmetic. The backprojection core was integrated into a system-onchip architecture and implemented in a field-programmable gate array. The proposed solution runs the optimized backprojection algorithm over images of sizes 512 x 512 and 1024 x 1024 in 0.14 s (0.41 J) and 1.11 s (3.24 J), respectively. The architecture is 2.6x faster and consumes 13x less energy than an embedded Jetson TX2 GPU. The solution is scalable and, therefore, a tradeoff exists between performance and utilization of resources.info:eu-repo/semantics/publishedVersio

    Efficient reconfigurable architectures for 3D medical image compression

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Recently, the more widespread use of three-dimensional (3-D) imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US) have generated a massive amount of volumetric data. These have provided an impetus to the development of other applications, in particular telemedicine and teleradiology. In these fields, medical image compression is important since both efficient storage and transmission of data through high-bandwidth digital communication lines are of crucial importance. Despite their advantages, most 3-D medical imaging algorithms are computationally intensive with matrix transformation as the most fundamental operation involved in the transform-based methods. Therefore, there is a real need for high-performance systems, whilst keeping architectures exible to allow for quick upgradeability with real-time applications. Moreover, in order to obtain efficient solutions for large medical volumes data, an efficient implementation of these operations is of significant importance. Reconfigurable hardware, in the form of field programmable gate arrays (FPGAs) has been proposed as viable system building block in the construction of high-performance systems at an economical price. Consequently, FPGAs seem an ideal candidate to harness and exploit their inherent advantages such as massive parallelism capabilities, multimillion gate counts, and special low-power packages. The key achievements of the work presented in this thesis are summarised as follows. Two architectures for 3-D Haar wavelet transform (HWT) have been proposed based on transpose-based computation and partial reconfiguration suitable for 3-D medical imaging applications. These applications require continuous hardware servicing, and as a result dynamic partial reconfiguration (DPR) has been introduced. Comparative study for both non-partial and partial reconfiguration implementation has shown that DPR offers many advantages and leads to a compelling solution for implementing computationally intensive applications such as 3-D medical image compression. Using DPR, several large systems are mapped to small hardware resources, and the area, power consumption as well as maximum frequency are optimised and improved. Moreover, an FPGA-based architecture of the finite Radon transform (FRAT)with three design strategies has been proposed: direct implementation of pseudo-code with a sequential or pipelined description, and block random access memory (BRAM)- based method. An analysis with various medical imaging modalities has been carried out. Results obtained for image de-noising implementation using FRAT exhibits promising results in reducing Gaussian white noise in medical images. In terms of hardware implementation, promising trade-offs on maximum frequency, throughput and area are also achieved. Furthermore, a novel hardware implementation of 3-D medical image compression system with context-based adaptive variable length coding (CAVLC) has been proposed. An evaluation of the 3-D integer transform (IT) and the discrete wavelet transform (DWT) with lifting scheme (LS) for transform blocks reveal that 3-D IT demonstrates better computational complexity than the 3-D DWT, whilst the 3-D DWT with LS exhibits a lossless compression that is significantly useful for medical image compression. Additionally, an architecture of CAVLC that is capable of compressing high-definition (HD) images in real-time without any buffer between the quantiser and the entropy coder is proposed. Through a judicious parallelisation, promising results have been obtained with limited resources. In summary, this research is tackling the issues of massive 3-D medical volumes data that requires compression as well as hardware implementation to accelerate the slowest operations in the system. Results obtained also reveal a significant achievement in terms of the architecture efficiency and applications performance.Ministry of Higher Education Malaysia (MOHE), Universiti Tun Hussein Onn Malaysia (UTHM) and the British Counci

    Energy-efficient and real-time wearable for wellbeing-monitoring IoT system based on SoC-FPGA

    Get PDF
    Wearable devices used for personal monitoring applications have been improved over the last decades. However, these devices are limited in terms of size, processing capability and power consumption. This paper proposes an efficient hardware/software embedded system for monitoring bio-signals in real time, including a heart rate calculator using PPG and an emotion classifier from EEG. The system is suitable for outpatient clinic applications requiring data transfers to external medical staff. The proposed solution contributes with an effective alternative to the traditional approach of processing bio-signals offline by proposing a SoC-FPGA based system that is able to fully process the signals locally at the node. Two sub-systems were developed targeting a Zynq 7010 device and integrating custom hardware IP cores that accelerate the processing of the most complex tasks. The PPG sub-system implements an autocorrelation peak detection algorithm to calculate heart rate values. The EEG sub-system consists of a KNN emotion classifier of preprocessed EEG features. This work overcomes the processing limitations of microcontrollers and general-purpose units, presenting a scalable and autonomous wearable solution with high processing capability and real-time response.info:eu-repo/semantics/publishedVersio

    GPU Parallelization of HEVC In-Loop Filters

    Get PDF
    In the High Efficiency Video Coding (HEVC) standard, multiple decoding modules have been designed to take advantage of parallel processing. In particular, the HEVC in-loop filters (i.e., the deblocking filter and sample adaptive offset) were conceived to be exploited by parallel architectures. However, the type of the offered parallelism mostly suits the capabilities of multi-core CPUs, thus making a real challenge to efficiently exploit massively parallel architectures such as Graphic Processing Units (GPUs), mainly due to the existing data dependencies between the HEVC decoding procedures. In accordance, this paper presents a novel strategy to increase the amount of parallelism and the resulting performance of the HEVC in-loop filters on GPU devices. For this purpose, the proposed algorithm performs the HEVC filtering at frame-level and employs intrinsic GPU vector instructions. When compared to the state-of-the-art HEVC in-loop filter implementations, the proposed approach also reduces the amount of required memory transfers, thus further boosting the performance. Experimental results show that the proposed GPU in-loop filters deliver a significant improvement in decoding performance. For example, average frame rates of 76 frames per second (FPS) and 125 FPS for Ultra HD 4K are achieved on an embedded NVIDIA GPU for All Intra and Random Access configurations, respectively

    Smart embedded system for skin cancer classification

    Get PDF
    The very good results achieved with recent algorithms for image classification based on deep learning have enabled new applications in many domains. The medical field is one that can greatly benefit from these algorithms in order to help the medical professional elaborate on his/her diagnostic. In particular, portable devices for medical image classification are useful in scenarios where a full analysis system is not an option or is difficult to obtain. Algorithms based on deep learning models are computationally demanding; therefore, it is difficult to run them in low-cost devices with a low energy consumption and high efficiency. In this paper, a low-cost system is proposed to classify skin cancer images. Two approaches were followed to achieve a fast and accurate system. At the algorithmic level, a cascade inference technique was considered, where two models were used for inference. At the architectural level, the deep learning processing unit from Vitis-AI was considered in order to design very efficient accelerators in FPGA. The dual model was trained and implemented for skin cancer detection in a ZYNQ UltraScale+ MPSoC ZCU104 evaluation kit with a ZU7EV device. The core was integrated in a full system-on-chip solution and tested with the HAM10000 dataset. It achieves a performance of 13.5 FPS with an accuracy of 87%, with only 33k LUTs, 80 DSPs, 70 BRAMs and 1 URAM.info:eu-repo/semantics/publishedVersio

    Field Programmable Gate Arrays (FPGAs) II

    Get PDF
    This Edited Volume Field Programmable Gate Arrays (FPGAs) II is a collection of reviewed and relevant research chapters, offering a comprehensive overview of recent developments in the field of Computer and Information Science. The book comprises single chapters authored by various researchers and edited by an expert active in the Computer and Information Science research area. All chapters are complete in itself but united under a common research study topic. This publication aims at providing a thorough overview of the latest research efforts by international authors on Computer and Information Science, and open new possible research paths for further novel developments

    Smart Sensor Data Acquisition in trains

    Get PDF
    Whether for work or leisure, we see a large number of people traveling by train every day. In order to ensure the comfort and safety of passengers, it must be checked whether the composition is working normally. For this purpose, a constant monitoring of a train must be done, followed by a diagnosis of the com-position, prediction of failures and production of alarms in the event of any anomaly. To perform monitoring on a train, it is necessary to collect data from sensors distributed along its carriages and send them to a software system that performs the diagnosis of the composition in a fast and efficient way. The description of the activities necessary for monitoring of a train imme-diately refers to topics such as distributed systems, since the intended system will have to integrate several sensors distributed along the train, or Smart Systems, since each sensor must have the capacity to not only acquire data, but also trans-mit it, preferably, wirelessly. However, there are some obstacles to the implementation of such a system. Firstly, the existence of sources of distortions and noise in the medium interferes both in the acquisition and transmission of data and secondly the fact that the sensors distributed along the train are not prepared to be connected directly to a software system. This dissertation seeks to find a solution for the problems described by im-plementing a data acquisition system that is distributed and takes advantage of the current technologies of low-cost sensor nodes as well as web technologies for sensor networks
    corecore