1,107 research outputs found

    An FPGA Implementation of HW/SW Codesign Architecture for H.263 Video Coding

    Get PDF
    Chapitre 12 http://www.intechopen.com/download/pdf/pdfs_id/1574

    Real-time scalable video coding for surveillance applications on embedded architectures

    Get PDF

    Side information exploitation, quality control and low complexity implementation for distributed video coding

    Get PDF
    Distributed video coding (DVC) is a new video coding methodology that shifts the highly complex motion search components from the encoder to the decoder, such a video coder would have a great advantage in encoding speed and it is still able to achieve similar rate-distortion performance as the conventional coding solutions. Applications include wireless video sensor networks, mobile video cameras and wireless video surveillance, etc. Although many progresses have been made in DVC over the past ten years, there is still a gap in RD performance between conventional video coding solutions and DVC. The latest development of DVC is still far from standardization and practical use. The key problems remain in the areas such as accurate and efficient side information generation and refinement, quality control between Wyner-Ziv frames and key frames, correlation noise modelling and decoder complexity, etc. Under this context, this thesis proposes solutions to improve the state-of-the-art side information refinement schemes, enable consistent quality control over decoded frames during coding process and implement highly efficient DVC codec. This thesis investigates the impact of reference frames on side information generation and reveals that reference frames have the potential to be better side information than the extensively used interpolated frames. Based on this investigation, we also propose a motion range prediction (MRP) method to exploit reference frames and precisely guide the statistical motion learning process. Extensive simulation results show that choosing reference frames as SI performs competitively, and sometimes even better than interpolated frames. Furthermore, the proposed MRP method is shown to significantly reduce the decoding complexity without degrading any RD performance. To minimize the block artifacts and achieve consistent improvement in both subjective and objective quality of side information, we propose a novel side information synthesis framework working on pixel granularity. We synthesize the SI at pixel level to minimize the block artifacts and adaptively change the correlation noise model according to the new SI. Furthermore, we have fully implemented a state-of-the-art DVC decoder with the proposed framework using serial and parallel processing technologies to identify bottlenecks and areas to further reduce the decoding complexity, which is another major challenge for future practical DVC system deployments. The performance is evaluated based on the latest transform domain DVC codec and compared with different standard codecs. Extensive experimental results show substantial and consistent rate-distortion gains over standard video codecs and significant speedup over serial implementation. In order to bring the state-of-the-art DVC one step closer to practical use, we address the problem of distortion variation introduced by typical rate control algorithms, especially in a variable bit rate environment. Simulation results show that the proposed quality control algorithm is capable to meet user defined target distortion and maintain a rather small variation for sequence with slow motion and performs similar to fixed quantization for fast motion sequence at the cost of some RD performance. Finally, we propose the first implementation of a distributed video encoder on a Texas Instruments TMS320DM6437 digital signal processor. The WZ encoder is efficiently implemented, using rate adaptive low-density-parity-check accumulative (LDPCA) codes, exploiting the hardware features and optimization techniques to improve the overall performance. Implementation results show that the WZ encoder is able to encode at 134M instruction cycles per QCIF frame on a TMS320DM6437 DSP running at 700MHz. This results in encoder speed 29 times faster than non-optimized encoder implementation. We also implemented a highly efficient DVC decoder using both serial and parallel technology based on a PC-HPC (high performance cluster) architecture, where the encoder is running in a general purpose PC and the decoder is running in a multicore HPC. The experimental results show that the parallelized decoder can achieve about 10 times speedup under various bit-rates and GOP sizes compared to the serial implementation and significant RD gains with regards to the state-of-the-art DISCOVER codec

    Exploring Processor and Memory Architectures for Multimedia

    Get PDF
    Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

    VLSI architecture design approaches for real-time video processing

    Get PDF
    This paper discusses the programmable and dedicated approaches for real-time video processing applications. Various VLSI architecture including the design examples of both approaches are reviewed. Finally, discussions of several practical designs in real-time video processing applications are then considered in VLSI architectures to provide significant guidelines to VLSI designers for any further real-time video processing design works

    An energy-aware system-on-chip architecture for intra prediction in HEVC standard

    Get PDF
    High resolution 4K and 8K are becoming the more used in video applications. Those resolutions are well supported in the new HEVC standard. Thus, embedded solutions such as development of dedicated ystems-On-Chips (SOC) to accelerate video processing on one chip instead of only software solutions are commendable. This paper proposes a novel parallel and high efficient hardware accelerator for the intra prediction block. This accelerator achieves a high-speed treatment due to pipelined processing units and parallel shaped architecture. The complexity of memory access is also reduced thanks to the proposed design with less increased power consumption. The implementation was performed on the 7 Series FPGA 28 nm technology resources on Zynq-7000 and results show, that the proposed architecture takes 16520 LUTs and can reach 143.65 MHz as a maximum frequency and it is able to support the throughput of 3840Ă—2160 sequence at 90 frames per second

    Implementation And Optimizaton Of Real-time H.264 Baseline Encoder On Tms320dm642 Dsp

    Get PDF
    Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2007Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2007Günümüzde sayısal video kodlama sayısal gözetim sistemleri, video konferans, mobil uygulamalar ve video yayını gibi bir çok uygulamada zorunlu hale gelmiştir. Uluslararası bir video sıkıştırma standardı olan H.264/MPEG-4 bölüm 10, daha önceki standartlara göre kodlama verimini iyileştirmek amacıyla geliştirilmiştir. Fakat, bu kodlama geliştirmesi beraberinde kodlama karmaşıklığının da artmasına yol açmaktadır. Bu tez çalışmasında Texas Instruments TMS320DM642 sayısal sinyal işleyici üzerinde H.264 temel profil kodlayıcı gerçeklenmiştir. DM642 DSP çekirdeği üzerindeki gerçek zamanlı H.264/AVC kodlayıcı uygulaması hata esnekliği araçları ve çeyrek piksel hareket dengeleme dışında standart tüm H.264/AVC temel profil kodlama araçlarını sunmaktadır. Çeyrek piksel hareket dengelem yerine, tüm parlaklılık ve renklik bileşenleri için tam sayı ve yarım piksel pozisyonlarında hareket kestirim ve dengeleme gerçeklenmiştir. Kullanılan DM642 DSP çekirdeği platformu, 2-seviyeli bellek/önbellek aşama düzenine sahip ve VLIW içeren yüksek performanslı sayısal işlemci olarak tasarlanmıştır. Sunulan H.264 temel kodlayıcı sistemin gerçeklenmesi ve eniyilemesi bu tezin konusudur. Üstelik, algoritma bazlı, mimari ve bellek stratejilerini içeren eniyileme çalışma fazları detaylarıyla açıklanmaktadır. H.264/AVC video kodlayıcının hem geliştirme ortamında hem de DM642 EVM donanım ortamında çalışması doğrulanmıştır. Kısaca, kodlayıcı sisteme giriş olan CIF çözünürlükte sıkıştırılmamış YUV video dizisi H.264 Annex-B dosya biçiminde ve de ekrana video çıktı verilerek sıkıştırılmaktadır. Ek olarak, kodlayıcı çıktısı H.264 referans yazılımla doğruluğu kontrol edilmiş ve uyumluluğu kanıtlanmıştır.Recently, digital video coding is mandatory in many applications such as digital surveillance systems, video conferencing, mobile applications as well as video broadcasts. The H.264/MPEG-4 Part 10, an international video compression standard, is developed for improving the coding efficiency compared to previous standards. However, the coding improvement comes with an increase in coding complexity. In this thesis, an H.264 baseline profile encoder is implemented on Texas Instruments TMS320DM642 digital signal processor. The real-time implementation of the H.264/AVC encoder on DM642 DSP core offers most of the standard H.264/AVC baseline profile coding tools except error resiliency tools and quarter-pel motion estimation. Instead of quarter-pel motion compensation, integer and half pixel position motion estimation and compensation for all luminance and chrominance components are implemented. The target platform, DM64 DSP core, is designed as a high-performance digital media processor with two-level memory/cache hierarchy and VLIW architecture. The subject of the thesis is H.264 baseline encoder system realization and optimization on the target platform. Moreover, the study of optimization phases covering algorithmic, architectural and memory strategies are clarified in details. The H.264/AVC encoder system is verified both to execute on the development workstation and DM642 EVM (Evaluation Module) hardware platform. Briefly, the uncompressed input of a YUV video sequence with CIF resolution to the encoder system is compressed to H.264 Annex-B file format and displayed on screen. Additionally, the encoder output is verified with H.264 reference software and the compliancy is proven.Yüksek LisansM.Sc

    Homogeneous and heterogeneous MPSoC architectures with network-on-chip connectivity for low-power and real-time multimedia signal processing

    Get PDF
    Two multiprocessor system-on-chip (MPSoC) architectures are proposed and compared in the paper with reference to audio and video processing applications. One architecture exploits a homogeneous topology; it consists of 8 identical tiles, each made of a 32-bit RISC core enhanced by a 64-bit DSP coprocessor with local memory. The other MPSoC architecture exploits a heterogeneous-tile topology with on-chip distributed memory resources; the tiles act as application specific processors supporting a different class of algorithms. In both architectures, the multiple tiles are interconnected by a network-on-chip (NoC) infrastructure, through network interfaces and routers, which allows parallel operations of the multiple tiles. The functional performances and the implementation complexity of the NoC-based MPSoC architectures are assessed by synthesis results in submicron CMOS technology. Among the large set of supported algorithms, two case studies are considered: the real-time implementation of an H.264/MPEG AVC video codec and of a low-distortion digital audio amplifier. The heterogeneous architecture ensures a higher power efficiency and a smaller area occupation and is more suited for low-power multimedia processing, such as in mobile devices. The homogeneous scheme allows for a higher flexibility and easier system scalability and is more suited for general-purpose DSP tasks in power-supplied devices

    Design of a High-Speed Architecture for Stabilization of Video Captured Under Non-Uniform Lighting Conditions

    Get PDF
    Video captured in shaky conditions may lead to vibrations. A robust algorithm to immobilize the video by compensating for the vibrations from physical settings of the camera is presented in this dissertation. A very high performance hardware architecture on Field Programmable Gate Array (FPGA) technology is also developed for the implementation of the stabilization system. Stabilization of video sequences captured under non-uniform lighting conditions begins with a nonlinear enhancement process. This improves the visibility of the scene captured from physical sensing devices which have limited dynamic range. This physical limitation causes the saturated region of the image to shadow out the rest of the scene. It is therefore desirable to bring back a more uniform scene which eliminates the shadows to a certain extent. Stabilization of video requires the estimation of global motion parameters. By obtaining reliable background motion, the video can be spatially transformed to the reference sequence thereby eliminating the unintended motion of the camera. A reflectance-illuminance model for video enhancement is used in this research work to improve the visibility and quality of the scene. With fast color space conversion, the computational complexity is reduced to a minimum. The basic video stabilization model is formulated and configured for hardware implementation. Such a model involves evaluation of reliable features for tracking, motion estimation, and affine transformation to map the display coordinates of a stabilized sequence. The multiplications, divisions and exponentiations are replaced by simple arithmetic and logic operations using improved log-domain computations in the hardware modules. On Xilinx\u27s Virtex II 2V8000-5 FPGA platform, the prototype system consumes 59% logic slices, 30% flip-flops, 34% lookup tables, 35% embedded RAMs and two ZBT frame buffers. The system is capable of rendering 180.9 million pixels per second (mpps) and consumes approximately 30.6 watts of power at 1.5 volts. With a 1024Ă—1024 frame, the throughput is equivalent to 172 frames per second (fps). Future work will optimize the performance-resource trade-off to meet the specific needs of the applications. It further extends the model for extraction and tracking of moving objects as our model inherently encapsulates the attributes of spatial distortion and motion prediction to reduce complexity. With these parameters to narrow down the processing range, it is possible to achieve a minimum of 20 fps on desktop computers with Intel Core 2 Duo or Quad Core CPUs and 2GB DDR2 memory without a dedicated hardware

    Image Processing Using FPGAs

    Get PDF
    This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs
    • …
    corecore