789 research outputs found

    Learned Quality Enhancement via Multi-Frame Priors for HEVC Compliant Low-Delay Applications

    Full text link
    Networked video applications, e.g., video conferencing, often suffer from poor visual quality due to unexpected network fluctuation and limited bandwidth. In this paper, we have developed a Quality Enhancement Network (QENet) to reduce the video compression artifacts, leveraging the spatial and temporal priors generated by respective multi-scale convolutions spatially and warped temporal predictions in a recurrent fashion temporally. We have integrated this QENet as a standard-alone post-processing subsystem to the High Efficiency Video Coding (HEVC) compliant decoder. Experimental results show that our QENet demonstrates the state-of-the-art performance against default in-loop filters in HEVC and other deep learning based methods with noticeable objective gains in Peak-Signal-to-Noise Ratio (PSNR) and subjective gains visually

    Deep Multi-modality Soft-decoding of Very Low Bit-rate Face Videos

    Full text link
    We propose a novel deep multi-modality neural network for restoring very low bit rate videos of talking heads. Such video contents are very common in social media, teleconferencing, distance education, tele-medicine, etc., and often need to be transmitted with limited bandwidth. The proposed CNN method exploits the correlations among three modalities, video, audio and emotion state of the speaker, to remove the video compression artifacts caused by spatial down sampling and quantization. The deep learning approach turns out to be ideally suited for the video restoration task, as the complex non-linear cross-modality correlations are very difficult to model analytically and explicitly. The new method is a video post processor that can significantly boost the perceptual quality of aggressively compressed talking head videos, while being fully compatible with all existing video compression standards.Comment: Accepted by Proceedings of the 28th ACM International Conference on Multimedia(ACM MM),202

    A Diffusion Model Based Quality Enhancement Method for HEVC Compressed Video

    Full text link
    Video post-processing methods can improve the quality of compressed videos at the decoder side. Most of the existing methods need to train corresponding models for compressed videos with different quantization parameters to improve the quality of compressed videos. However, in most cases, the quantization parameters of the decoded video are unknown. This makes existing methods have their limitations in improving video quality. To tackle this problem, this work proposes a diffusion model based post-processing method for compressed videos. The proposed method first estimates the feature vectors of the compressed video and then uses the estimated feature vectors as the prior information for the quality enhancement model to adaptively enhance the quality of compressed video with different quantization parameters. Experimental results show that the quality enhancement results of our proposed method on mixed datasets are superior to existing methods.Comment: 10 pages, conferenc

    Prioritizing Content of Interest in Multimedia Data Compression

    Get PDF
    Image and video compression techniques make data transmission and storage in digital multimedia systems more efficient and feasible for the system's limited storage and bandwidth. Many generic image and video compression techniques such as JPEG and H.264/AVC have been standardized and are now widely adopted. Despite their great success, we observe that these standard compression techniques are not the best solution for data compression in special types of multimedia systems such as microscopy videos and low-power wireless broadcast systems. In these application-specific systems where the content of interest in the multimedia data is known and well-defined, we should re-think the design of a data compression pipeline. We hypothesize that by identifying and prioritizing multimedia data's content of interest, new compression methods can be invented that are far more effective than standard techniques. In this dissertation, a set of new data compression methods based on the idea of prioritizing the content of interest has been proposed for three different kinds of multimedia systems. I will show that the key to designing efficient compression techniques in these three cases is to prioritize the content of interest in the data. The definition of the content of interest of multimedia data depends on the application. First, I show that for microscopy videos, the content of interest is defined as the spatial regions in the video frame with pixels that don't only contain noise. Keeping data in those regions with high quality and throwing out other information yields to a novel microscopy video compression technique. Second, I show that for a Bluetooth low energy beacon based system, practical multimedia data storage and transmission is possible by prioritizing content of interest. I designed custom image compression techniques that preserve edges in a binary image, or foreground regions of a color image of indoor or outdoor objects. Last, I present a new indoor Bluetooth low energy beacon based augmented reality system that integrates a 3D moving object compression method that prioritizes the content of interest.Doctor of Philosoph

    Scene Matters: Model-based Deep Video Compression

    Full text link
    Video compression has always been a popular research area, where many traditional and deep video compression methods have been proposed. These methods typically rely on signal prediction theory to enhance compression performance by designing high efficient intra and inter prediction strategies and compressing video frames one by one. In this paper, we propose a novel model-based video compression (MVC) framework that regards scenes as the fundamental units for video sequences. Our proposed MVC directly models the intensity variation of the entire video sequence in one scene, seeking non-redundant representations instead of reducing redundancy through spatio-temporal predictions. To achieve this, we employ implicit neural representation as our basic modeling architecture. To improve the efficiency of video modeling, we first propose context-related spatial positional embedding and frequency domain supervision in spatial context enhancement. For temporal correlation capturing, we design the scene flow constrain mechanism and temporal contrastive loss. Extensive experimental results demonstrate that our method achieves up to a 20\% bitrate reduction compared to the latest video coding standard H.266 and is more efficient in decoding than existing video coding strategies

    Exploiting Temporal Image Information in Minimally Invasive Surgery

    Get PDF
    Minimally invasive procedures rely on medical imaging instead of the surgeons direct vision. While preoperative images can be used for surgical planning and navigation, once the surgeon arrives at the target site real-time intraoperative imaging is needed. However, acquiring and interpreting these images can be challenging and much of the rich temporal information present in these images is not visible. The goal of this thesis is to improve image guidance for minimally invasive surgery in two main areas. First, by showing how high-quality ultrasound video can be obtained by integrating an ultrasound transducer directly into delivery devices for beating heart valve surgery. Secondly, by extracting hidden temporal information through video processing methods to help the surgeon localize important anatomical structures. Prototypes of delivery tools, with integrated ultrasound imaging, were developed for both transcatheter aortic valve implantation and mitral valve repair. These tools provided an on-site view that shows the tool-tissue interactions during valve repair. Additionally, augmented reality environments were used to add more anatomical context that aids in navigation and in interpreting the on-site video. Other procedures can be improved by extracting hidden temporal information from the intraoperative video. In ultrasound guided epidural injections, dural pulsation provides a cue in finding a clear trajectory to the epidural space. By processing the video using extended Kalman filtering, subtle pulsations were automatically detected and visualized in real-time. A statistical framework for analyzing periodicity was developed based on dynamic linear modelling. In addition to detecting dural pulsation in lumbar spine ultrasound, this approach was used to image tissue perfusion in natural video and generate ventilation maps from free-breathing magnetic resonance imaging. A second statistical method, based on spectral analysis of pixel intensity values, allowed blood flow to be detected directly from high-frequency B-mode ultrasound video. Finally, pulsatile cues in endoscopic video were enhanced through Eulerian video magnification to help localize critical vasculature. This approach shows particular promise in identifying the basilar artery in endoscopic third ventriculostomy and the prostatic artery in nerve-sparing prostatectomy. A real-time implementation was developed which processed full-resolution stereoscopic video on the da Vinci Surgical System

    Video quality for video analysis

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH
    corecore