185 research outputs found

    Low energy HEVC and VVC video compression hardware

    Get PDF
    Video compression standards compress a digital video by reducing and removing redundancy in the digital video using computationally complex algorithms. As spatial and temporal resolutions of videos increase, compression efficiencies of video compression algorithms are also increasing. However, increased compression efficiency comes with increased computational complexity. Therefore, it is necessary to reduce computational complexities of video compression algorithms without reducing their visual quality in order to reduce area and energy consumption of their hardware implementations. In this thesis, we propose a novel technique for reducing amount of computations performed by HEVC intra prediction algorithm. We designed low energy, reconfigurable HEVC intra prediction hardware using the proposed technique. We also designed a low energy FPGA implementation of HEVC intra prediction algorithm using the proposed technique and DSP blocks. We propose a reconfigurable VVC intra prediction hardware architecture. We also propose an efficient VVC intra prediction hardware architecture using DSP blocks. We designed low energy VVC fractional interpolation hardware. We propose a novel approximate absolute difference technique. We designed low energy approximate absolute difference hardware using the proposed technique. We propose a novel approximate constant multiplication technique. We designed approximate constant multiplication hardware using the proposed technique. We quantified computation reductions achieved by the proposed techniques and video quality loss caused by the proposed approximation techniques. The proposed approximate absolute difference technique and approximate constant multiplication technique cause very small PSNR loss. The other proposed techniques cause no PSNR loss. We implemented the proposed hardware architectures in Verilog HDL. We mapped the Verilog RTL codes to Xilinx Virtex 6 or Xilinx Virtex 7 FPGAs and estimated their power consumptions using Xilinx XPower Analyzer tool. The proposed techniques significantly reduced power and energy consumptions of these FPGA implementation

    Flexible Baseband Modulator Architecture for Multi-Waveform 5G Communications

    Get PDF
    The fifth-generation (5G) revolution represents more than a mere performance enhancement of previous generations: it will deeply transform the way humans and/or machines interact, enabling a heterogeneous expansion in the number of use cases and services. Crucial to the realization of this revolution is the design of hardware components characterized by high degrees of flexibility, versatility and resource/power efficiency. This chapter proposes a field-programmable gate array (FPGA)-oriented baseband processing architecture suitable for fast-changing communication environments such as 4G/5G waveform coexistence, noncontiguous carrier aggregation (CA) or centralized cloud radio access network (C-RAN) processing. The proposed architecture supports three 5G waveform candidates and is shown to be upgradable, resource-efficient and cost-effective. Through hardware virtualization, enabled by dynamic partial reconfiguration (DPR), the design space exploration of our architecture exceeds the hardware resources available on the Zynq xc7z020 device. Moreover, dynamic frequency scaling (DFS) enables the runtime adjustment of processing throughput and power reductions by up to 88%. The combined resource overhead for DPR and DFS is very low, and the reconfiguration latency stays two orders of magnitude below the control plane latency requirements proposed for 5G communications

    Dynamically reconfigurable asynchronous processor

    Get PDF
    The main design requirements for today's mobile applications are: · high throughput performance. · high energy efficiency. · high programmability. Until now, the choice of platform has often been limited to Application-Specific Integrated Circuits (ASICs), due to their best-of-breed performance and power consumption. The economies of scale possible with these high-volume markets have traditionally been able to hide the high Non-Recurring Engineering (NRE) costs required for designing and fabricating new ASICs. However, with the NREs and design time escalating with each generation of mobile applications, this practice may be reaching its limit. Designers today are looking at programmable solutions, so that they can respond more rapidly to changes in the market and spread costs over several generations of mobile applications. However, there have been few feasible alternatives to ASICs: Digital Signals Processors (DSPs) and microprocessors cannot meet the throughput requirements, whereas Field-Programmable Gate Arrays (FPGAs) require too much area and power. Coarse-grained dynamically reconfigurable architectures offer better solutions for high throughput applications, when power and area considerations are taken into account. One promising example is the Reconfigurable Instruction Cell Array (RICA). RICA consists of an array of cells with an interconnect that can be dynamically reconfigured on every cycle. This allows quite complex datapaths to be rendered onto the fabric and executed in a single configuration - making these architectures particularly suitable to stream processing. Furthermore, RICA can be programmed from C, making it a good fit with existing design methodologies. However the RICA architecture has a drawback: poor scalability in terms of area and power. As the core gets bigger, the number of sequential elements in the array must be increased significantly to maintain the ability to achieve high throughputs through pipelining. As a result, a larger clock tree is required to synchronise the increased number of sequential elements. The clock tree therefore takes up a larger percentage of the area and power consumption of the core. This thesis presents a novel Dynamically Reconfigurable Asynchronous Processor (DRAP), aimed at high-throughput mobile applications. DRAP is based on the RICA architecture, but uses asynchronous design techniques - methods of designing digital systems without clocks. The absence of a global clock signal makes DRAP more scalable in terms of power and area overhead than its synchronous counterpart. The DRAP architecture maintains most of the benefits of custom asynchronous design, whilst also providing programmability via conventional high-level languages. Results show that the DRAP processor delivers considerably lower power consumption when compared to a market-leading Very Long Instruction Word (VLIW) processor and a low-power ARM processor. For example, DRAP resulted in a reduction in power consumption of 20 times compared to the ARM7 processor, and 29 times compared to the TIC64x VLIW, when running the same benchmark capped to the same throughput and for the same process technology (0.13μm). When compared to an equivalent RICA design, DRAP was up to 22% larger than RICA but resulted in a power reduction of up to 1.9 times. It was also capable of achieving up to 2.8 times higher throughputs than RICA for the same benchmarks

    An efficient FPGA implementation of versatile video coding intra prediction

    Get PDF
    Versatile Video Coding (VVC) is a new international video compression standard offering much better compression efficiency than previous video compression standards at the expense of much higher computational complexity. In this paper, an efficient FPGA implementation of VVC intra prediction for angular prediction modes of 4x4, 8x8, 16x16 and 32x32 prediction unit sizes is proposed. In the proposed FPGA implementation, four constant multiplications used in one intra angular prediction equation are implemented using two DSP blocks and two adders in FPGA. The proposed FPGA implementation of VVC intra prediction, in the worst case, can process 34 full HD (1920x1080) frames per second

    High performance HEVC and FVC video compression hardware designs

    Get PDF
    High Efficiency Video Coding (HEVC) is the current state-of-the-art video compression standard developed by Joint collaborative team on video coding (JCT-VC). HEVC has 50% better compression efficiency than H.264 which is the previous video compression standard. HEVC achieves this video compression efficiency by significantly increasing the computational complexity. Therefore, in this thesis, we proposed a low complexity HEVC sub-pixel motion estimation (SPME) technique for SPME in HEVC encoder. We designed and implemented a high performance HEVC SPME hardware implementing the proposed technique. We also designed and implemented an HEVC fractional interpolation hardware using memory based constant multiplication technique for both HEVC encoder and decoder. Future Video Coding (FVC) is a new international video compression standard which is currently being developed by JCT-VC. FVC offers much better compression efficiency than the state-of-the-art HEVC video compression standard at the expense of much higher computational complexity. In this thesis, we designed and implemented three different high performance FVC 2D transform hardware. The proposed hardware is verified to work correctly on an FPGA board

    A power and time efficient radio architecture for LDACS1 air-to-ground communication

    Get PDF
    L-band Digital Aeronautical Communication System (LDACS) is an emerging standard that aims at enhancing air traffic management by transitioning the traditional analog aeronautical communication systems to the superior and highly efficient digital domain. The standard places stringent requirements on the communication channels to allow them to coexist with critical L-band systems, requiring complex processing and filters in baseband. Approaches based on cognitive radio are also proposed since this allows tremendous increase in communication capacity and spectral efficiency. This requires high computational capability in airborne vehicles that can perform the complex filtering and masking, along with tasks associated with cognitive radio systems like spectrum sensing and baseband adaptation, while consuming very less power. This paper proposes a radio architecture based on new generation FPGAs that offers advanced capabilities like partial reconfiguration. The proposed architecture allows non-concurrent baseband modules to be dynamically loaded only when they are required, resulting in improved energy efficiency, without sacrificing performance. We evaluate the case of non-concurrent spectrum sensing logic and transmission filters on our cognitive radio platform based on Xilinx Zynq, and show that our approach results in 28.3% reduction in DSP utilisation leading to lower energy consumption at run-time

    Mapping adaptive particle filters to heterogeneous reconfigurable systems

    Get PDF
    This article presents an approach for mapping real-time applications based on particle filters (PFs) to heterogeneous reconfigurable systems, which typically consist of multiple FPGAs and CPUs. A method is proposed to adapt the number of particles dynamically and to utilise runtime reconfigurability of FPGAs for reduced power and energy consumption. A data compression scheme is employed to reduce communication overhead between FPGAs and CPUs. A mobile robot localisation and tracking application is developed to illustrate our approach. Experimental results show that the proposed adaptive PF can reduce up to 99% of computation time. Using runtime reconfiguration, we achieve a 25% to 34% reduction in idle power. A 1U system with four FPGAs is up to 169 times faster than a single-core CPU and 41 times faster than a 1U CPU server with 12 cores. It is also estimated to be 3 times faster than a system with four GPUs

    Design and application of reconfigurable circuits and systems

    No full text
    Open Acces

    On the Feasibility and Limitations of Just-in-Time Instruction Set Extension for FPGA-Based Reconfigurable Processors

    Get PDF
    Reconfigurable instruction set processors provide the possibility of tailor the instruction set of a CPU to a particular application. While this customization process could be performed during runtime in order to adapt the CPU to the currently executed workload, this use case has been hardly investigated. In this paper, we study the feasibility of moving the customization process to runtime and evaluate the relation of the expected speedups and the associated overheads. To this end, we present a tool flow that is tailored to the requirements of this just-in-time ASIP specialization scenario. We evaluate our methods by targeting our previously introduced Woolcano reconfigurable ASIP architecture for a set of applications from the SPEC2006, SPEC2000, MiBench, and SciMark2 benchmark suites. Our results show that just-in-time ASIP specialization is promising for embedded computing applications, where average speedups of 5x can be achieved by spending 50 minutes for custom instruction identification and hardware generation. These overheads will be compensated if the applications execute for more than 2 hours. For the scientific computing benchmarks, the achievable speedup is only 1.2x, which requires significant execution times in the order of days to amortize the overheads
    corecore