13 research outputs found

    GRACE: Loss-Resilient Real-Time Video through Neural Codecs

    Full text link
    In real-time video communication, retransmitting lost packets over high-latency networks is not viable due to strict latency requirements. To counter packet losses without retransmission, two primary strategies are employed -- encoder-based forward error correction (FEC) and decoder-based error concealment. The former encodes data with redundancy before transmission, yet determining the optimal redundancy level in advance proves challenging. The latter reconstructs video from partially received frames, but dividing a frame into independently coded partitions inherently compromises compression efficiency, and the lost information cannot be effectively recovered by the decoder without adapting the encoder. We present a loss-resilient real-time video system called GRACE, which preserves the user's quality of experience (QoE) across a wide range of packet losses through a new neural video codec. Central to GRACE's enhanced loss resilience is its joint training of the neural encoder and decoder under a spectrum of simulated packet losses. In lossless scenarios, GRACE achieves video quality on par with conventional codecs (e.g., H.265). As the loss rate escalates, GRACE exhibits a more graceful, less pronounced decline in quality, consistently outperforming other loss-resilient schemes. Through extensive evaluation on various videos and real network traces, we demonstrate that GRACE reduces undecodable frames by 95% and stall duration by 90% compared with FEC, while markedly boosting video quality over error concealment methods. In a user study with 240 crowdsourced participants and 960 subjective ratings, GRACE registers a 38% higher mean opinion score (MOS) than other baselines

    Perceptual Optimizations for Video Capture, Processing, and Storage Systems

    No full text
    Thesis (Ph.D.)--University of Washington, 2020Visual media is the dominant form of content used in modern computing systems. Advances in machine learning, virtual reality, and display form factors drive demand for richer visual experiences, putting pressure on systems to efficiently use compute and storage infrastructure. At the same time, the rapid pace of performance and energy efficiency gains computer architects depended on to meet growing application requirements has slowed. Designing computer systems to meet the requirements of modern video-based applications requires specialization in compute design, using hardware-software codesign techniques to closely optimize computer system performance for specific visual computing workloads. This thesis uses perceptual information to optimize the design of video capture, processing and storage systems. I describe system optimizations using three classes of perceptual cues: structure (e.g., color, depth); semantics (e.g., faces, objects); and saliency (e.g., human visual saliency, neural network feature saliency). This thesis demonstrates how perceptual information can be used in hardware accelerator designs on ASICs and FPGAs, and in cloud video storage infrastructure
    corecore