Search CORE

131 research outputs found

Architecture design of a scalable adaptive deblocking filter for H.264/AVC

Author: Ernst Eric
Publication venue: RIT Scholar Works
Publication date: 01/07/2007
Field of study

Due to significant bit-rate savings and improved perceptual quality, H.264/AVC, the latest video compression standard from the Joint Video Team, is receiving widespread adoption. Greater coding efficiency relative to previous standards is a result of additional techniques and features. One important change is the inclusion of an in-loop deblocking filter for removal of blocking artifacts. Since the filter can easily account for one-third of the computational complexity of a decoder, its addition was a source of debate during the development of the H.264/AVC standard. Ample research on architecture design of the deblocking filter has been carried out, generally targeted toward high performance profiles. To the best of our knowledge no other research investigated designs that can be scaled from low-power extended profiles up to high performance profiles. This work investigated the design of a scalable architecture for the deblocking filter. Four different designs were implemented. The relative performance of the designs were then compared against each other and existing research through simulation. All designs were targeted towards a Xilinx Virtex 5 field programmable gate array (FPGA)

RIT Scholar Works

Methodology and optimizing of multiple frame format buffering within FPGA H.264/AVC decoder with FRExt.

Author: Stotts Timothy Aaron
Publication venue: RIT Scholar Works
Publication date: 01/08/2007
Field of study

Digital representation of video data is an inherently resource demanding problem that continues to necessitate the development and refinement of coding methods. The H.264/AVC standard, along with its recent Fidelity Range Extensions amendment (FRExt), is quickly being adopted as the standard codec for broadcast and distribution of high definition video. The FRExt amendment, while not necessarily affecting the overall decoder architecture, presents an added complexity of providing efficient memory management for buffering intermediate frames of various pixel color samplings and depths. This thesis evaluated the role of designing the frame buffer of a hardware video decoder, with integrated support for the H.264/AVC codec plus FRExt. With focus on organizing external memory data access, the frame buffer was designed to provide intermediate data storage for the decoder, while using an efficient store and load scheme that takes into consideration each frame pixel format of the video data. VHDL was used to model the frame buffer. Exploitation of reconfigurability and post-synthesis FPGA simulations were used to evaluate behavior, scalability and power consumption, while providing an analysis of approaches to adding FRExt to the memory management. Real-time buffer performance was achieved for two common frame formats at 1080 HD resolution; and an innovative pipeline design provides dynamic switching of formats between video sequences. As an additional consequence of verifying the model, a preexisting Baseline H.264/AVC decoder testbench was augmented to support testing of multiple frame formats

RIT Scholar Works

Multi-frame reconstruction using super-resolution, inpainting, segmentation and codecs

Author: Khorasani Ghassab Vahid
Publication venue
Publication date: 03/06/2021
Field of study

In this thesis, different aspects of video and light field reconstruction are considered such as super-resolution, inpainting, segmentation and codecs. For this purpose, each of these strategies are analyzed based on a specific goal and a specific database. Accordingly, databases which are relevant to film industry, sport videos, light fields and hyperspectral videos are used for the sake of improvement. This thesis is constructed around six related manuscripts, in which several approaches are proposed for multi-frame reconstruction. Initially, a novel multi-frame reconstruction strategy is proposed for lightfield super-resolution in which graph-based regularization is applied along with edge preserving filtering for improving the spatio-angular quality of lightfield. Second, a novel video reconstruction is proposed which is built based on compressive sensing (CS), Gaussian mixture models (GMM) and sparse 3D transform-domain block matching. The motivation of the proposed technique is the improvement in visual quality performance of the video frames and decreasing the reconstruction error in comparison with the former video reconstruction methods. In the next approach, student-t mixture models and edge preserving filtering are applied for the purpose of video super-resolution. Student-t mixture model has a heavy tail which makes it robust and suitable as a video frame patch prior and rich in terms of log likelihood for information retrieval. In another approach, a hyperspectral video database is considered, and a Bayesian dictionary learning process is used for hyperspectral video super-resolution. To that end, Beta process is used in Bayesian dictionary learning and a sparse coding is generated regarding the hyperspectral video super-resolution. The spatial super-resolution is followed by a spectral video restoration strategy, and the whole process leveraged two different dictionary learnings, in which the first one is trained for spatial super-resolution and the second one is trained for the spectral restoration. Furthermore, in another approach, a novel framework is proposed for replacing advertisement contents in soccer videos in an automatic way by using deep learning strategies. For this purpose, a UNET architecture is applied (an image segmentation convolutional neural network technique) for content segmentation and detection. Subsequently, after reconstructing the segmented content in the video frames (considering the apparent loss in detection), the unwanted content is replaced by new one using a homography mapping procedure. In addition, in another research work, a novel video compression framework is presented using autoencoder networks that encode and decode videos by using less chroma information than luma information. For this purpose, instead of converting Y'CbCr 4:2:2/4:2:0 videos to and from RGB 4:4:4, the video is kept in Y'CbCr 4:2:2/4:2:0 and merged the luma and chroma channels after the luma is downsampled to match the chroma size. An inverse function is performed for the decoder. The performance of these models is evaluated by using CPSNR, MS-SSIM, and VMAF metrics. The experiments reveal that, as compared to video compression involving conversion to and from RGB 4:4:4, the proposed method increases the video quality by about 5.5% for Y'CbCr 4:2:2 and 8.3% for Y'CbCr 4:2:0 while reducing the amount of computation by nearly 37% for Y'CbCr 4:2:2 and 40% for Y'CbCr 4:2:0. The thread that ties these approaches together is reconstruction of the video and light field frames based on different aspects of problems such as having loss of information, blur in the frames, existing noise after reconstruction, existing unpleasant content, excessive size of information and high computational overhead. In three of the proposed approaches, we have used Plug-and-Play ADMM model for the first time regarding reconstruction of videos and light fields in order to address both information retrieval in the frames and tackling noise/blur at the same time. In two of the proposed models, we applied sparse dictionary learning to reduce the data dimension and demonstrate them as an efficient linear combination of basis frame patches. Two of the proposed approaches are developed in collaboration with industry, in which deep learning frameworks are used to handle large set of features and to learn high-level features from the data

Concordia University Research Repository

Algorithms and Hardware Co-Design of HEVC Intra Encoders

Author: Zhang Yuanzhi
Publication venue: OpenSIUC
Publication date: 01/12/2019
Field of study

Digital video is becoming extremely important nowadays and its importance has greatly increased in the last two decades. Due to the rapid development of information and communication technologies, the demand for Ultra-High Definition (UHD) video applications is becoming stronger. However, the most prevalent video compression standard H.264/AVC released in 2003 is inefficient when it comes to UHD videos. The increasing desire for superior compression efficiency to H.264/AVC leads to the standardization of High Efficiency Video Coding (HEVC). Compared with the H.264/AVC standard, HEVC offers a double compression ratio at the same level of video quality or substantial improvement of video quality at the same video bitrate. Yet, HE-VC/H.265 possesses superior compression efficiency, its complexity is several times more than H.264/AVC, impeding its high throughput implementation. Currently, most of the researchers have focused merely on algorithm level adaptations of HEVC/H.265 standard to reduce computational intensity without considering the hardware feasibility. What’s more, the exploration of efficient hardware architecture design is not exhaustive. Only a few research works have been conducted to explore efficient hardware architectures of HEVC/H.265 standard. In this dissertation, we investigate efficient algorithm adaptations and hardware architecture design of HEVC intra encoders. We also explore the deep learning approach in mode prediction. From the algorithm point of view, we propose three efficient hardware-oriented algorithm adaptations, including mode reduction, fast coding unit (CU) cost estimation, and group-based CABAC (context-adaptive binary arithmetic coding) rate estimation. Mode reduction aims to reduce mode candidates of each prediction unit (PU) in the rate-distortion optimization (RDO) process, which is both computation-intensive and time-consuming. Fast CU cost estimation is applied to reduce the complexity in rate-distortion (RD) calculation of each CU. Group-based CABAC rate estimation is proposed to parallelize syntax elements processing to greatly improve rate estimation throughput. From the hardware design perspective, a fully parallel hardware architecture of HEVC intra encoder is developed to sustain UHD video compression at 4K@30fps. The fully parallel architecture introduces four prediction engines (PE) and each PE performs the full cycle of mode prediction, transform, quantization, inverse quantization, inverse transform, reconstruction, rate-distortion estimation independently. PU blocks with different PU sizes will be processed by the different prediction engines (PE) simultaneously. Also, an efficient hardware implementation of a group-based CABAC rate estimator is incorporated into the proposed HEVC intra encoder for accurate and high-throughput rate estimation. To take advantage of the deep learning approach, we also propose a fully connected layer based neural network (FCLNN) mode preselection scheme to reduce the number of RDO modes of luma prediction blocks. All angular prediction modes are classified into 7 prediction groups. Each group contains 3-5 prediction modes that exhibit a similar prediction angle. A rough angle detection algorithm is designed to determine the prediction direction of the current block, then a small scale FCLNN is exploited to refine the mode prediction

OpenSIUC

Evaluation of the color image and video processing chain and visual quality management for consumer systems

Author: Sarkar Abhijit
Publication venue: RIT Scholar Works
Publication date: 01/05/2008
Field of study

With the advent of novel digital display technologies, color processing is increasingly becoming a key aspect in consumer video applications. Today’s state-of-the-art displays require sophisticated color and image reproduction techniques in order to achieve larger screen size, higher luminance and higher resolution than ever before. However, from color science perspective, there are clearly opportunities for improvement in the color reproduction capabilities of various emerging and conventional display technologies. This research seeks to identify potential areas for improvement in color processing in a video processing chain. As part of this research, various processes involved in a typical video processing chain in consumer video applications were reviewed. Several published color and contrast enhancement algorithms were evaluated, and a novel algorithm was developed to enhance color and contrast in images and videos in an effective and coordinated manner. Further, a psychophysical technique was developed and implemented for performing visual evaluation of color image and consumer video quality. Based on the performance analysis and visual experiments involving various algorithms, guidelines were proposed for the development of an effective color and contrast enhancement method for images and video applications. It is hoped that the knowledge gained from this research will help build a better understanding of color processing and color quality management methods in consumer video

RIT Scholar Works

Robust error detection methods for H.264/AVC videos

Author: Rodríguez Rodríguez Eva
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2008
Field of study

The 3rd generation of mobile systems is mainly focused on enabling multimedia services such as video streaming, video call and conferencing. In order to achieve this, the Universal Mobile Telecommunications System (UMTS), is the standard that has been developed by the 3rd Generation Partnership ect (3GPP) in Europe, including the baseline profile of H.264/AVC in the specification. With the union of both technologies a great improvement on video transmission over mobile networks, and even modification of the user habits towards the use of the mobile phone is expected. Nevertheless, video transmission has always been related to wired networks and unfortunately the migration to wireless networks is not as easy as it seems. In real time applications the delay is a critical constraint. Usually, transmission protocols without delivery warranties, like the User Network Protocol (UDP) for IP based networks, are used. This works under the assumption that in real time applications dropped packets are preferable to delayed packets. Moreover, in UMTS the network needs to be treated in a different way, thus the wireless channel is a prone error channel due to its high time variance. Typically, when transmitting video, the receiver checks whether the information packet is corrupted (by means of a checksum) or if its temporal mark exceeds the specified delay. This approach is suboptimal, due to the fact that perhaps the video information is not damaged and could still be used. Instead, residual redundancy on the video stream can be used to locate the errors in the corrupted packet, increasing the granularity of the typical upper-layer checksum error detection. Based on this, the amount of information previous to the error detection can be decoded as usually. The aim of this thesis is to combine some of the more effective methods concretely, Syntax check, Watermarking and Checksum schemes have been reformulated, combined and simulated

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Object Enhancement, Noise Reduction, Conversion and Collection of Spatiotemporal Image Data

Author: Izazguirre Michael, Jr.
Publication venue: ScholarWorks @ UTRGV
Publication date: 01/05/2019
Field of study

In this report, a variety of cellular dynamics are enhanced and analyzed utilizing various algorithms and filter for contrast enhancement. This report will also illustrate the underlying complexities of processing compressed data received from certain type of sensors, their default applications, various methods in converting compressed data to compatible universal uncompressed formats allowed in scientific applications, various methods of image and video capture, guidelines in ethical image manipulation, various methods of frame extraction, and analyzing/processing video images. These methods and processes purposely utilize freeware and public domain software to lower the cost of reproducibility for all

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

Improvement of Decision on Coding Unit Split Mode and Intra-Picture Prediction by Machine Learning

Author: Jiang Wenchan
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 16/08/2018
Field of study

High efficiency Video Coding (HEVC) has been deemed as the newest video coding standard of the ITU-T Video Coding Experts Group and the ISO/IEC Moving Picture Experts Group. The reference software (i.e., HM) have included the implementations of the guidelines in appliance with the new standard. The software includes both encoder and decoder functionality. Machine learning (ML) works with data and processes it to discover patterns that can be later used to analyze new trends. ML can play a key role in a wide range of critical applications, such as data mining, natural language processing, image recognition, and expert systems. In this research project, in compliance with H.265 standard, we are focused on improvement of the performance of encode/decode by optimizing the partition of prediction block in coding unit with the help of supervised machine learning. We used Keras library as the main tool to implement the experiments. Key parameters were tuned for the model in our convolution neuron network. The coding tree unit mode decision time produced in the model was compared with that produced in HM software, and it was proved to have improved significantly. The intra-picture prediction mode decision was also investigated with modified model and yielded satisfactory results

DigitalCommons@Kennesaw State University

Hardware Implementation of a High Speed Deblocking Filter for the H.264 Video Codec

Author: Dickey Brian
Publication venue: 'University of Waterloo'
Publication date: 01/01/2012
Field of study

H.264/MPEG-4 part 10 or Advanced Video Coding (AVC) is a standard for video compression. MPEG-4 is currently one of the most widely used formats for recording, compression and distribution of high definition video. One feature of the AVC codec is the inclusion of an in-loop deblocking filter. The goal of the deblocking filter is to remove blocking artifacts that exist at macroblock boundaries. However, due to the complexity of the deblocking algorithm, the filter can easily account for one-third of the computational complexity of a decoder. In this thesis, a modification to the deblocking algorithm given in the AVC standard is presented. This modification allows the algorithm to finish the filtering of a macroblock to finish twenty clock cycles faster than previous single filter designs. This thesis also presents a hardware architecture of the H.264 deblocking filter to be used in the H.264 decoder. The developed architecture allows the filtering of videos streams using 4:2:2 chroma subsampling and 10-bit pixel precision in real-time. The filter was described in VHDL and synthesized for a Spartan-6 FPGA device. Timing analysis showed that is was capable of filtering a macroblock using 4:2:0 chroma subsampling in 124 clock cycles and 4:2:2 chroma subsampling streams in 162 clock cycles. The filter can also provide real-time deblocking of HDTV video (1920x1080) of up to 988 frames per second

University of Waterloo's Institutional Repository