282 research outputs found

    GRACE: Loss-Resilient Real-Time Video through Neural Codecs

    Full text link
    In real-time video communication, retransmitting lost packets over high-latency networks is not viable due to strict latency requirements. To counter packet losses without retransmission, two primary strategies are employed -- encoder-based forward error correction (FEC) and decoder-based error concealment. The former encodes data with redundancy before transmission, yet determining the optimal redundancy level in advance proves challenging. The latter reconstructs video from partially received frames, but dividing a frame into independently coded partitions inherently compromises compression efficiency, and the lost information cannot be effectively recovered by the decoder without adapting the encoder. We present a loss-resilient real-time video system called GRACE, which preserves the user's quality of experience (QoE) across a wide range of packet losses through a new neural video codec. Central to GRACE's enhanced loss resilience is its joint training of the neural encoder and decoder under a spectrum of simulated packet losses. In lossless scenarios, GRACE achieves video quality on par with conventional codecs (e.g., H.265). As the loss rate escalates, GRACE exhibits a more graceful, less pronounced decline in quality, consistently outperforming other loss-resilient schemes. Through extensive evaluation on various videos and real network traces, we demonstrate that GRACE reduces undecodable frames by 95% and stall duration by 90% compared with FEC, while markedly boosting video quality over error concealment methods. In a user study with 240 crowdsourced participants and 960 subjective ratings, GRACE registers a 38% higher mean opinion score (MOS) than other baselines

    RAWIW: RAW Image Watermarking Robust to ISP Pipeline

    Full text link
    Invisible image watermarking is essential for image copyright protection. Compared to RGB images, RAW format images use a higher dynamic range to capture the radiometric characteristics of the camera sensor, providing greater flexibility in post-processing and retouching. Similar to the master recording in the music industry, RAW images are considered the original format for distribution and image production, thus requiring copyright protection. Existing watermarking methods typically target RGB images, leaving a gap for RAW images. To address this issue, we propose the first deep learning-based RAW Image Watermarking (RAWIW) framework for copyright protection. Unlike RGB image watermarking, our method achieves cross-domain copyright protection. We directly embed copyright information into RAW images, which can be later extracted from the corresponding RGB images generated by different post-processing methods. To achieve end-to-end training of the framework, we integrate a neural network that simulates the ISP pipeline to handle the RAW-to-RGB conversion process. To further validate the generalization of our framework to traditional ISP pipelines and its robustness to transmission distortion, we adopt a distortion network. This network simulates various types of noises introduced during the traditional ISP pipeline and transmission. Furthermore, we employ a three-stage training strategy to strike a balance between robustness and concealment of watermarking. Our extensive experiments demonstrate that RAWIW successfully achieves cross-domain copyright protection for RAW images while maintaining their visual quality and robustness to ISP pipeline distortions

    Enhancing the quality of video streaming over unreliable wireless networks

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.Real-time video transmission over unreliable wireless networks remains a serious challenge due to bandwidth limitation and sensitive nature of video bitstreams generated by today’s complex video encoders, e.g., High Efficiency Video Coding (HEVC/H.265). These compressed video bitsreams face packet-drop problem when transmitted over unreliable wireless networks. The effect of packet-drop on the received video quality can be minimised in two ways 1) increasing Quality of Service (QoS) by adopting efficient routing schemes between source and destination, and 2) maintaining video quality at receiver’s side by applying smart and real-time-based Error Concealment (EC) techniques. The QoS refers to the capability of a transmission network to provide better service to selected network traffic. It is a generic term and can be applied to any data transmission network. The term video quality refers to perceived video degradation and is compared to the original video. In this dissertation, we explore the above mentioned two ways and propose a comprehensive solution for real-time video transmission over unreliable networks with the contributions as follows. 1. An efficient, lightweight and real-time EC algorithm is proposed to conceal the missing/lost video frames in H.265 encoded HD videos. The EC algorithm is based on threshold-based distributed Motion Estimation (ME) scheme and utilises only two video frames to estimate the missing one, thus eliminating the need for a large buffer and processing of a bundle of video frames to estimate the missing one. 2. Scalable video coding produces multiple interrelated bitstreams of a single video with different bitrates. For scalable bitstreams, we propose a lightweight and real-time EC algorithm to cover up the effects of missing/lost video frames. Due to complicated nature of scalable video bitstreams, our proposed EC algorithm utilises three previously processed video frames along with their master video frames to perform threshold-based distributed ME to estimate the missing video frames in enhancement layer. 3. We propose a feed-back-based on-demand multipath routing scheme over a multi-hop Wireless Multimedia Sensor Network (WMSN) to ensure the QoS. The feedback helps in deciding the optimum path between sources and destinations and reduces the Packets Loss Ratio (PLR) during the transmissions. On-demand connection assists in saving the available network resources while multipath routing aids in maintaining the connection between sources and destinations. The proposed research makes notable contributions to designing a QoS-supported HD video streaming paradigm to deliver HD videos over unreliable networks and to maintain the received video quality on resource-constrained mobile terminals

    On the Application of Dictionary Learning to Image Compression

    Get PDF
    Signal models are a cornerstone of contemporary signal and image-processing methodology. In this chapter, a particular signal modelling method, called synthesis sparse representation, is studied which has been proven to be effective for many signals, such as natural images, and successfully used in a wide range of applications. In this kind of signal modelling, the signal is represented with respect to dictionary. The dictionary choice plays an important role on the success of the entire model. One main discipline of dictionary designing is based on a machine learning methodology which provides a simple and expressive structure for designing adaptable and efficient dictionaries. This chapter focuses on direct application of the sparse representation, i.e. image compression. Two image codec based on adaptive sparse representation over a trained dictionary are introduced. Experimental results show that the presented methods outperform the existing image coding standards, such as JPEG and JPEG2000

    Livrable D2.2 of the PERSEE project : Analyse/Synthese de Texture

    Get PDF
    Livrable D2.2 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D2.2 du projet. Son titre : Analyse/Synthese de Textur

    Meta-Transfer Learning Driven Tensor-Shot Detector for the Autonomous Localization and Recognition of Concealed Baggage Threats

    Get PDF
    Screening baggage against potential threats has become one of the prime aviation security concerns all over the world, where manual detection of prohibited items is a time-consuming and hectic process. Many researchers have developed autonomous systems to recognize baggage threats using security X-ray scans. However, all of these frameworks are vulnerable against screening cluttered and concealed contraband items. Furthermore, to the best of our knowledge, no framework possesses the capacity to recognize baggage threats across multiple scanner specifications without an explicit retraining process. To overcome this, we present a novel meta-transfer learning-driven tensor-shot detector that decomposes the candidate scan into dual-energy tensors and employs a meta-one-shot classification backbone to recognize and localize the cluttered baggage threats. In addition, the proposed detection framework can be well-generalized to multiple scanner specifications due to its capacity to generate object proposals from the unified tensor maps rather than diversified raw scans. We have rigorously evaluated the proposed tensor-shot detector on the publicly available SIXray and GDXray datasets (containing a cumulative of 1,067,381 grayscale and colored baggage X-ray scans). On the SIXray dataset, the proposed framework achieved a mean average precision (mAP) of 0.6457, and on the GDXray dataset, it achieved the precision and F1 score of 0.9441 and 0.9598, respectively. Furthermore, it outperforms state-of-the-art frameworks by 8.03% in terms of mAP, 1.49% in terms of precision, and 0.573% in terms of F1 on the SIXray and GDXray dataset, respectively

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE

    Signal processing for high-definition television

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Mathematics, 1995.Includes bibliographical references (p. 60-62).by Peter Monta.Ph.D
    • …
    corecore