13 research outputs found

    Low Complexity Interpolation Filters for Motion Estimation and Application to the H.264 Encoders

    Get PDF
    Techniques for image super-resolution play an important role in a plethora of applications, which include video compression and motion estimation. The detection of the fractional displacements among frames facilitates the removal of temporal redundancy and improves the video quality by 2-4 dB PSNR. However, the increased complexity of the Fractional Motion Estimation (FME) process adds a significant computational load to the encoder and sets constraints to real-time designs. Researchers have performed timing analysis for the motion estimation process and they reported that FME accounts for almost half of the entire motion estimation period, which in turn accounts for 60-90% of the total encoding time depending on the design configuration

    Programmable Motion Estimation Architecture

    No full text
    Abstract-This paper presents a real-time Motion Estimation architecture with improved hardware cost. The design bases on a parallel memory organization minimizing the resources required to support integer and sub-pixel modes of search, while it sustains the required throughput of pixels to the SAD calculator. A speculative execution technique improves the number of cycles required by the search process. The architecture is programmable including an instruction set for actions common to all the block-matching techniques, while circuits introduced at compile time accommodate individual actions of the most demanding algorithms such as MVFAST and PMVFAST. A FPGA implementation validates HDTV video performance

    LDPC Hardware Acceleration in 5G Open Radio Access Network Platforms

    No full text
    In the last few years, Internet of Things, Cloud computing, Edge computing, and Fog computing have gained a lot of attention in both industry and academia. However, a clear and neat definition of these computing paradigms and their correlation is hard to find in the literature. This makes it difficult for researchers new to this area to get a concrete picture of these paradigms. This work tackles this deficiency, representing a helpful resource for those who will start next. First, we show the evolution of modern computing paradigms and related research interest. Then, we address each paradigm, neatly delineating its key points and its relation with the others. There after, we extensively address Fog computing, remarking its outstanding role as the glue between IoT, Cloud, and Edge computing. In the end, we briefly present open challenges and future research directions for IoT, Cloud, Edge, and Fog computing

    Resources and Power Efficient FPGA Accelerators for Real-Time Image Classification

    No full text
    A plethora of image and video-related applications involve complex processes that impose the need for hardware accelerators to achieve real-time performance. Among these, notable applications include the Machine Learning (ML) tasks using Convolutional Neural Networks (CNNs) that detect objects in image frames. Aiming at contributing to the CNN accelerator solutions, the current paper focuses on the design of Field-Programmable Gate Arrays (FPGAs) for CNNs of limited feature space to improve performance, power consumption and resource utilization. The proposed design approach targets the designs that can utilize the logic and memory resources of a single FPGA device and benefit mainly the edge, mobile and on-board satellite (OBC) computing; especially their image-processing- related applications. This work exploits the proposed approach to develop an FPGA accelerator for vessel detection on a Xilinx Virtex 7 XC7VX485T FPGA device (Advanced Micro Devices, Inc, Santa Clara, CA, USA). The resulting architecture operates on RGB images of size 80×80 or sliding windows; it is trained for the “Ships in Satellite Imagery” and by achieving frequency 270 MHz, completing the inference in 0.687 ms and consuming 5 watts, it validates the approach

    A Co-Design Approach For Rapid Prototyping Of Image Processing On SoC FPGAs

    No full text
    Achieving real-time performance in image processing with embedded devices poses a very challenging task due to the computationally and memory intensive nature of the algorithms. The FPGA platforms provide very attractive solutions in such applications, because they support highly parallel processing with low power consumption. In this paper we present an approach to increase productivity when developing real-time image processing algorithms on SoC FPGA devices. Our approach is centered around the fast communication of the HW and SW components and the use of an open-source operating system hosted on the existing embedded processor. Based on this approach we decrease time-to-market while at the same time we avoid hindering the real-time operation of the system. To demonstrate the capabilities of the proposed system, as a proof of concept, we use the well known Harris detection algorithm and the Xilinx Zynq XC7Z020 FPGA device. We present an in-depth performance analysis regarding the resource utilization of the FPGA, the operation frequency, the communication overhead and the power consumption

    Deep Learning-Based Image Regression for Short-Term Solar Irradiance Forecasting on the Edge

    No full text
    Photovoltaic (PV) power production is characterized by high variability due to short-term meteorological effects such as cloud movements. These effects have a significant impact on the incident solar irradiance in PV parks. In order to control PV park performance, researchers have focused on Computer Vision and Deep Learning approaches to perform short-term irradiance forecasting using sky images. Motivated by the task of improving PV park control, the current work introduces the Image Regression Module, which produces irradiance values from sky images using image processing methods and Convolutional Neural Networks (CNNs). With the objective of enhancing the performance of CNN models on the task of irradiance estimation and forecasting, we propose an image processing method based on sun localization. Our findings show that the proposed method can consistently improve the accuracy of irradiance values produced by all the CNN models of our study, reducing the Root Mean Square Error by up to 10.44 W/m2 for the MobileNetV2 model. These findings indicate that future applications which utilize CNNs for irradiance forecasting should identify the position of the sun in the image in order to produce more accurate irradiance values. Moreover, the integration of the proposed models on an edge-oriented Field-Programmable Gate Array (FPGA) towards a smart PV park for the real-time control of PV production emphasizes their advantages

    Design and Comparison of FFT VLSI Architectures for SoC Telecom Applications with Different Flexibility, Speed and Complexity Trade-Offs

    No full text
    The design of Fast Fourier Transform (FFT) integrated architectures for System-on-Chip (SoC) telecom applications is addressed in this paper. After reviewing the FFT processing requirements of wireless and wired Orthogonal Frequency Division Multiplexing (OFDM) standards, including the emerging Multiple Input Multiple Output (MIMO) and OFDM Access (OFDMA) schemes, three FFT architectures are proposed: a fully parallel, a pipelined cascade and an in-place variable-size architecture, which offer different trade-offs among flexibility, processing speed and complexity. Silicon implementation results and comparisons with the state-of-the-art prove that each macrocell outperforms the known works for a target application. The fully parallel is optimized for throughput requirements up to several GSamples/s enabling Ultra-wideband (UWB) communications by using all channels foreseen in the standard. The pipelined cascade macrocell minimizes complexity for large size FFTs sustaining throughput up to 100 MSamples/s. The in-place variable-size FFT macrocell stands for its flexibility by allowing run-time reconfigurability required in OFDMA schemes while attaining the required throughput to support MIMO communications. The three architectures are also compared with common case-studies and target technology

    Neural Network-Based Solar Irradiance Forecast for Edge Computing Devices

    No full text
    Aiming at effectively improving photovoltaic (PV) park operation and the stability of the electricity grid, the current paper addresses the design and development of a novel system achieving the short-term irradiance forecasting for the PV park area, which is the key factor for controlling the variations in the PV power production. First, it introduces the Xception long short-term memory (XceptionLSTM) cell tailored for recurrent neural networks (RNN). Second, it presents the novel irradiance forecasting model that consists of a sequence-to-sequence image regression NNs in the form of a spatio-temporal encoder–decoder including Xception layers in the spatial encoder, the novel XceptionLSTM in the temporal encoder and decoder and a multilayer perceptron in the spatial decoder. The proposed model achieves a forecast skill of 16.57% for a horizon of 5 min when compared to the persistence model. Moreover, the proposed model is designed for execution on edge computing devices and the real-time application of the inference on the Raspberry Pi 4 Model B 8 GB and the Raspberry Pi Zero 2W validates the results

    Scheduler Accelerator for TDMA Data Centers

    No full text
    Today's Data Centers networks depend on optical switching to overcome the scalability limitations of traditional architectures. All optical networks most often use slotted Time Division Multiple Access (TDMA) operation; their buffers are located at the optical network edges and their organization relies on effective scheduling of the TDMA frames to achieve efficient sharing of the network resources and a collision-free network operation. Scheduling decisions have to be taken in real time, a process that becomes computationally demanding as the network size increases. Accelerators provide a solution and the present paper proposes a scheduler accelerator to accommodate a data center network divided into points of delivery (pods) of racks and exploiting hybrid electro-optical top-of-rack (ToR) switches that access an all-optical inter-rack network. The scheduler accelerator is a parallel scalable architecture with application specific processing engines. Case studies of 2, 4, 8, 16 processors configuration are presented for the processing of all the transfer TDMA time slot requests for the cases of 512 and 1024 ToR network nodes. The architecture is realized on a Xilinx VC707 board to validate the results
    corecore