30 research outputs found

    Metaverse: A Young Gamer's Perspective

    Full text link
    When developing technologies for the Metaverse, it is important to understand the needs and requirements of end users. Relatively little is known about the specific perspectives on the use of the Metaverse by the youngest audience: children ten and under. This paper explores the Metaverse from the perspective of a young gamer. It examines their understanding of the Metaverse in relation to the physical world and other technologies they may be familiar with, looks at some of their expectations of the Metaverse, and then relates these to the specific multimedia signal processing (MMSP) research challenges. The perspectives presented in the paper may be useful for planning more detailed subjective experiments involving young gamers, as well as informing the research on MMSP technologies targeted at these users.Comment: 6 pages, 5 figures, IEEE MMSP 202

    LCCM-VC: Learned Conditional Coding Modes for Video Compression

    Full text link
    End-to-end learning-based video compression has made steady progress over the last several years. However, unlike learning-based image coding, which has already surpassed its handcrafted counterparts, learning-based video coding still has some ways to go. In this paper we present learned conditional coding modes for video coding (LCCM-VC), a video coding model that achieves state-of-the-art results among learning-based video coding methods. Our model utilizes conditional coding engines from the recent conditional augmented normalizing flows (CANF) pipeline, and introduces additional coding modes to improve compression performance. The compression efficiency is especially good in the high-quality/high-bitrate range, which is important for broadcast and video-on-demand streaming applications. The implementation of LCCM-VC is available at https://github.com/hadihdz/lccm_vcComment: 5 pages, 3 figures, IEEE ICASSP 202

    Tensor Completion Methods for Collaborative Intelligence

    Get PDF
    In the race to bring Artificial Intelligence (AI) to the edge, collaborative intelligence has emerged as a promising way to lighten the computation load on edge devices that run applications based on Deep Neural Networks (DNNs). Typically, a deep model is split at a given layer into edge and cloud sub-models. The deep feature tensor produced by the edge sub-model is transmitted to the cloud, where the remaining computationally intensive workload is performed by the cloud sub-model. The communication channel between the edge and cloud is imperfect, which will result in missing data in the deep feature tensor received at the cloud side, an issue that has mostly been ignored by existing literature on the topic. In this paper we study four methods for recovering missing data in the deep feature tensor. Three of the studied methods are existing, generic tensor completion methods, and are adapted here to recover deep feature tensor data, while the fourth method is newly developed specifically for deep feature tensor completion. Simulation studies show that the new method is 3–18 times faster than the other three methods, which is an important consideration in collaborative intelligence. For VGG16’s sparse tensors, all methods produce statistically equivalent classification results across all loss levels tested. For ResNet34’s non-sparse tensors, the new method offers statistically better classification accuracy (by 0.25%–6.30%) compared to other methods for matched execution speeds, and second-best accuracy among the four methods when they are allowed to run until convergence

    Learned Scalable Video Coding For Humans and Machines

    Full text link
    Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic monitoring, analytics such as vehicle detection, tracking and counting, would run continuously, while human viewing could be required occasionally to review potential incidents. To support such applications, a new paradigm for video coding is needed that will facilitate efficient representation and compression of video for both machine and human use in a scalable manner. In this manuscript, we introduce the first end-to-end learnable video codec that supports a machine vision task in its base layer, while its enhancement layer supports input reconstruction for human viewing. The proposed system is constructed based on the concept of conditional coding to achieve better compression gains. Comprehensive experimental evaluations conducted on four standard video datasets demonstrate that our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer. We will provide the implementation of the proposed system at www.github.com upon completion of the review process.Comment: 14 pages, 16 figure

    Scalable Video Coding for Humans and Machines

    Full text link
    Video content is watched not only by humans, but increasingly also by machines. For example, machine learning models analyze surveillance video for security and traffic monitoring, search through YouTube videos for inappropriate content, and so on. In this paper, we propose a scalable video coding framework that supports machine vision (specifically, object detection) through its base layer bitstream and human vision via its enhancement layer bitstream. The proposed framework includes components from both conventional and Deep Neural Network (DNN)-based video coding. The results show that on object detection, the proposed framework achieves 13-19% bit savings compared to state-of-the-art video codecs, while remaining competitive in terms of MS-SSIM on the human vision task.Comment: 6 pages, 5 figures, IEEE MMSP 202
    corecore