7,599 research outputs found
A High performance and low cost hardware arcitecture for H.264 transform and quantization algorithms
In this paper, we present a high performance and low cost hardware architecture for real-time implementation of forward transform and quantization and inverse transform and quantization algorithms used in H.264 / MPEG4 Part 10 video coding standard. The hard-ware architecture is based on a reconfigurable datapath with only one multiplier. This hardware is designed to be used as part of a complete low power H.264 video coding system for portable appli-cations. The proposed architecture is implemented in Verilog HDL. The Verilog RTL code is verified to work at 81 MHz in a Xilinx Virtex II FPGA and it is verified to work at 210 MHz in a 0.18´ ASIC implementation. The FPGA and ASIC implementations can code 27 and 70 VGA frames (640x480) per second respectively
Recommended from our members
Timing models for high-level synthesis
In this paper, we describe a timing model for clock estimation during high-level synthesis. In order to obtain realistic timing estimates, the proposed model considers all delay elements, including datapath, control and wire delays, and several technology factors, such as layout architecture, technology mapping, buffers insertion and loading effects. The experimental results show that this model can provide much better estimates than previous models. This model is well suited for automatic and interactive synthesis as well as feedback-driven synthesis where performance matrices must be rapidly and incrementally calculated
An Empirical Model of Packet Processing Delay of the Open vSwitch
Network virtualization offers flexibility by decoupling virtual network from
the underlying physical network. Software-Defined Network (SDN) could utilize
the virtual network. For example, in Software-Defined Networks, the entire
network can be run on commodity hardware and operating systems that use virtual
elements. However, this could present new challenges of data plane performance.
In this paper, we present an empirical model of the packet processing delay of
a widely used OpenFlow virtual switch, the Open vSwitch. In the empirical
model, we analyze the effect of varying Random Access Memory (RAM) and network
parameters on the performance of the Open vSwitch. Our empirical model captures
the non-network processing delays, which could be used in enhancing the network
modeling and simulation
Recommended from our members
Silicon compilation
Silicon compilation is a term used for many different purposes. In this paper we define silicon compilation as a mapping from some higher level description into layout. We define the basic issues in structural and behavioral silicon compilation and some possible solutions to those issues. Finally, we define the concept of an intelligent silicon compiler in which the compiler evaluates the quality of the generated design and attempts to improve it if it is not satisfactory
A High permormance hardware architecture for an sad reuse based hierarchical motion estimation algorithm for H.264 video coding
In this paper, we present a high performance and low cost hardware architecture for real-time implementation of an SAD reuse based hierarchical motion estimation algorithm for H.264 / MPEG4 Part 10 video coding. This hardware is designed to be used as part of a complete H.264 video coding system for portable applications. The proposed architecture is implemented in Verilog HDL. The Verilog RTL code is verified to work at 68 MHz in a Xilinx Virtex II FPGA. The FPGA implementation can process 27 VGA frames (640x480) or 82 CIF frames (352x288) per second
On the hardware reduction of z-datapath of vectoring CORDIC
In this article we present a novel design of a hardware optimal vectoring CORDIC processor. We present a mathematical theory to show that using bipolar binary notation it is possible to eliminate all the arithmetic computations required along the z-datapath. Using this technique it is possible to achieve three and 1.5 times reduction in the number of registers and adder respectively compared to conventional CORDIC. Following this, a 16-bit vectoring CORDIC is designed for the application in Synchronizer for IEEE 802.11a standard. The total area and dynamic power consumption of the processor is 0.14 mm2 and 700?W respectively when synthesized in 0.18?m CMOS library which shows its effectiveness as a low-area low-power processor
- …