20,654 research outputs found

    RR-DnCNN v2.0: Enhanced Restoration-Reconstruction Deep Neural Network for Down-Sampling Based Video Coding

    Full text link
    Integrating deep learning techniques into the video coding framework gains significant improvement compared to the standard compression techniques, especially applying super-resolution (up-sampling) to down-sampling based video coding as post-processing. However, besides up-sampling degradation, the various artifacts brought from compression make super-resolution problem more difficult to solve. The straightforward solution is to integrate the artifact removal techniques before super-resolution. However, some helpful features may be removed together, degrading the super-resolution performance. To address this problem, we proposed an end-to-end restoration-reconstruction deep neural network (RR-DnCNN) using the degradation-aware technique, which entirely solves degradation from compression and sub-sampling. Besides, we proved that the compression degradation produced by Random Access configuration is rich enough to cover other degradation types, such as Low Delay P and All Intra, for training. Since the straightforward network RR-DnCNN with many layers as a chain has poor learning capability suffering from the gradient vanishing problem, we redesign the network architecture to let reconstruction leverages the captured features from restoration using up-sampling skip connections. Our novel architecture is called restoration-reconstruction u-shaped deep neural network (RR-DnCNN v2.0). As a result, our RR-DnCNN v2.0 outperforms the previous works and can attain 17.02% BD-rate reduction on UHD resolution for all-intra anchored by the standard H.265/HEVC. The source code is available at https://minhmanho.github.io/rrdncnn/.Comment: Published in TIP (Open Access). Check our source code at https://minhmanho.github.io/rrdncnn

    A Framework for Super-Resolution of Scalable Video via Sparse Reconstruction of Residual Frames

    Full text link
    This paper introduces a framework for super-resolution of scalable video based on compressive sensing and sparse representation of residual frames in reconnaissance and surveillance applications. We exploit efficient compressive sampling and sparse reconstruction algorithms to super-resolve the video sequence with respect to different compression rates. We use the sparsity of residual information in residual frames as the key point in devising our framework. Moreover, a controlling factor as the compressibility threshold to control the complexity-performance trade-off is defined. Numerical experiments confirm the efficiency of the proposed framework in terms of the compression rate as well as the quality of reconstructed video sequence in terms of PSNR measure. The framework leads to a more efficient compression rate and higher video quality compared to other state-of-the-art algorithms considering performance-complexity trade-offs.Comment: IEEE Military Communications Conference, MILCOM, 201

    A Group Variational Transformation Neural Network for Fractional Interpolation of Video Coding

    Full text link
    Motion compensation is an important technology in video coding to remove the temporal redundancy between coded video frames. In motion compensation, fractional interpolation is used to obtain more reference blocks at sub-pixel level. Existing video coding standards commonly use fixed interpolation filters for fractional interpolation, which are not efficient enough to handle diverse video signals well. In this paper, we design a group variational transformation convolutional neural network (GVTCNN) to improve the fractional interpolation performance of the luma component in motion compensation. GVTCNN infers samples at different sub-pixel positions from the input integer-position sample. It first extracts a shared feature map from the integer-position sample to infer various sub-pixel position samples. Then a group variational transformation technique is used to transform a group of copied shared feature maps to samples at different sub-pixel positions. Experimental results have identified the interpolation efficiency of our GVTCNN. Compared with the interpolation method of High Efficiency Video Coding, our method achieves 1.9% bit saving on average and up to 5.6% bit saving under low-delay P configuration.Comment: DCC 201

    Robust Emotion Recognition from Low Quality and Low Bit Rate Video: A Deep Learning Approach

    Full text link
    Emotion recognition from facial expressions is tremendously useful, especially when coupled with smart devices and wireless multimedia applications. However, the inadequate network bandwidth often limits the spatial resolution of the transmitted video, which will heavily degrade the recognition reliability. We develop a novel framework to achieve robust emotion recognition from low bit rate video. While video frames are downsampled at the encoder side, the decoder is embedded with a deep network model for joint super-resolution (SR) and recognition. Notably, we propose a novel max-mix training strategy, leading to a single "One-for-All" model that is remarkably robust to a vast range of downsampling factors. That makes our framework well adapted for the varied bandwidths in real transmission scenarios, without hampering scalability or efficiency. The proposed framework is evaluated on the AVEC 2016 benchmark, and demonstrates significantly improved stand-alone recognition performance, as well as rate-distortion (R-D) performance, than either directly recognizing from LR frames, or separating SR and recognition.Comment: Accepted by the Seventh International Conference on Affective Computing and Intelligent Interaction (ACII2017

    A novel super resolution reconstruction of low reoslution images progressively using dct and zonal filter based denoising

    Full text link
    Due to the factors like processing power limitations and channel capabilities images are often down sampled and transmitted at low bit rates resulting in a low resolution compressed image. High resolution images can be reconstructed from several blurred, noisy and down sampled low resolution images using a computational process know as super resolution reconstruction. Super-resolution is the process of combining multiple aliased low-quality images to produce a high resolution, high-quality image. The problem of recovering a high resolution image progressively from a sequence of low resolution compressed images is considered. In this paper we propose a novel DCT based progressive image display algorithm by stressing on the encoding and decoding process. At the encoder we consider a set of low resolution images which are corrupted by additive white Gaussian noise and motion blur. The low resolution images are compressed using 8 by 8 blocks DCT and noise is filtered using our proposed novel zonal filter. Multiframe fusion is performed in order to obtain a single noise free image. At the decoder the image is reconstructed progressively by transmitting the coarser image first followed by the detail image. And finally a super resolution image is reconstructed by applying our proposed novel adaptive interpolation technique. We have performed both objective and subjective analysis of the reconstructed image, and the resultant image has better super resolution factor, and a higher ISNR and PSNR. A comparative study done with Iterative Back Projection (IBP) and Projection on to Convex Sets (POCS),Papoulis Grechberg, FFT based Super resolution Reconstruction shows that our method has out performed the previous contributions.Comment: 20 pages, 11 figure

    Kernel based low-rank sparse model for single image super-resolution

    Full text link
    Self-similarity learning has been recognized as a promising method for single image super-resolution (SR) to produce high-resolution (HR) image in recent years. The performance of learning based SR reconstruction, however, highly depends on learned representation coeffcients. Due to the degradation of input image, conventional sparse coding is prone to produce unfaithful representation coeffcients. To this end, we propose a novel kernel based low-rank sparse model with self-similarity learning for single image SR which incorporates nonlocalsimilarity prior to enforce similar patches having similar representation weights. We perform a gradual magnification scheme, using self-examples extracted from the degraded input image and up-scaled versions. To exploit nonlocal-similarity, we concatenate the vectorized input patch and its nonlocal neighbors at different locations into a data matrix which consists of similar components. Then we map the nonlocal data matrix into a high-dimensional feature space by kernel method to capture their nonlinear structures. Under the assumption that the sparse coeffcients for the nonlocal data in the kernel space should be low-rank, we impose low-rank constraint on sparse coding to share similarities among representation coeffcients and remove outliers in order that stable weights for SR reconstruction can be obtained. Experimental results demonstrate the advantage of our proposed method in both visual quality and reconstruction error.Comment: 27 pages, Keywords: low-rank, sparse representation, kernel method, self-similarity learning, super-resolutio

    Sparse Coding Approach for Multi-Frame Image Super Resolution

    Full text link
    An image super-resolution method from multiple observation of low-resolution images is proposed. The method is based on sub-pixel accuracy block matching for estimating relative displacements of observed images, and sparse signal representation for estimating the corresponding high-resolution image. Relative displacements of small patches of observed low-resolution images are accurately estimated by a computationally efficient block matching method. Since the estimated displacements are also regarded as a warping component of image degradation process, the matching results are directly utilized to generate low-resolution dictionary for sparse image representation. The matching scores of the block matching are used to select a subset of low-resolution patches for reconstructing a high-resolution patch, that is, an adaptive selection of informative low-resolution images is realized. When there is only one low-resolution image, the proposed method works as a single-frame super-resolution method. The proposed method is shown to perform comparable or superior to conventional single- and multi-frame super-resolution methods through experiments using various real-world datasets

    Improving Low Bit-Rate Video Coding using Spatio-Temporal Down-Scaling

    Full text link
    Good quality video coding for low bit-rate applications is important for transmission over narrow-bandwidth channels and for storage with limited memory capacity. In this work, we develop a previous analysis for image compression at low bit-rates to adapt it to video signals. Improving compression using down-scaling in the spatial and temporal dimensions is examined. We show, both theoretically and experimentally, that at low bit-rates, we benefit from applying spatio-temporal scaling. The proposed method includes down-scaling before the compression and a corresponding up-scaling afterwards, while the codec itself is left unmodified. We propose analytic models for low bit-rate compression and spatio-temporal scaling operations. Specifically, we use theoretic models of motion-compensated prediction of available and absent frames as in coding and frame-rate up-conversion (FRUC) applications, respectively. The proposed models are designed for multi-resolution analysis. In addition, we formulate a bit-allocation procedure and propose a method for estimating good down-scaling factors of a given video based on its second-order statistics and the given bit-budget. We validate our model with experimental results of H.264 compression

    Stereo on a budget

    Full text link
    We propose an algorithm for recovering depth using less than two images. Instead of having both cameras send their entire image to the host computer, the left camera sends its image to the host while the right camera sends only a fraction ϵ\epsilon of its image. The key aspect is that the cameras send the information without communicating at all. Hence, the required communication bandwidth is significantly reduced. While standard image compression techniques can reduce the communication bandwidth, this requires additional computational resources on the part of the encoder (camera). We aim at designing a light weight encoder that only touches a fraction of the pixels. The burden of decoding is placed on the decoder (host). We show that it is enough for the encoder to transmit a sparse set of pixels. Using only 1+ϵ1+\epsilon images, with ϵ\epsilon as little as 2% of the image, the decoder can compute a depth map. The depth map's accuracy is comparable to traditional stereo matching algorithms that require both images as input. Using the depth map and the left image, the right image can be synthesized. No computations are required at the encoder, and the decoder's runtime is linear in the images' size.Comment: update flowchart in Fig.

    Deep Reference Generation with Multi-Domain Hierarchical Constraints for Inter Prediction

    Full text link
    Inter prediction is an important module in video coding for temporal redundancy removal, where similar reference blocks are searched from previously coded frames and employed to predict the block to be coded. Although traditional video codecs can estimate and compensate for block-level motions, their inter prediction performance is still heavily affected by the remaining inconsistent pixel-wise displacement caused by irregular rotation and deformation. In this paper, we address the problem by proposing a deep frame interpolation network to generate additional reference frames in coding scenarios. First, we summarize the previous adaptive convolutions used for frame interpolation and propose a factorized kernel convolutional network to improve the modeling capacity and simultaneously keep its compact form. Second, to better train this network, multi-domain hierarchical constraints are introduced to regularize the training of our factorized kernel convolutional network. For spatial domain, we use a gradually down-sampled and up-sampled auto-encoder to generate the factorized kernels for frame interpolation at different scales. For quality domain, considering the inconsistent quality of the input frames, the factorized kernel convolution is modulated with quality-related features to learn to exploit more information from high quality frames. For frequency domain, a sum of absolute transformed difference loss that performs frequency transformation is utilized to facilitate network optimization from the view of coding performance. With the well-designed frame interpolation network regularized by multi-domain hierarchical constraints, our method surpasses HEVC on average 6.1% BD-rate saving and up to 11.0% BD-rate saving for the luma component under the random access configuration
    corecore