30 research outputs found

    Streaming and User Behaviour in Omnidirectional Videos

    Get PDF
    Omnidirectional videos (ODVs) have gone beyond the passive paradigm of traditional video, offering higher degrees of immersion and interaction. The revolutionary novelty of this technology is the possibility for users to interact with the surrounding environment, and to feel a sense of engagement and presence in a virtual space. Users are clearly the main driving force of immersive applications and consequentially the services need to be properly tailored to them. In this context, this chapter highlights the importance of the new role of users in ODV streaming applications, and thus the need for understanding their behaviour while navigating within ODVs. A comprehensive overview of the research efforts aimed at advancing ODV streaming systems is also presented. In particular, the state-of-the-art solutions under examination in this chapter are distinguished in terms of system-centric and user-centric streaming approaches: the former approach comes from a quite straightforward extension of well-established solutions for the 2D video pipeline while the latter one takes the benefit of understanding users’ behaviour and enable more personalised ODV streaming

    Tile Classification Based Viewport Prediction with Multi-modal Fusion Transformer

    Full text link
    Viewport prediction is a crucial aspect of tile-based 360 video streaming system. However, existing trajectory based methods lack of robustness, also oversimplify the process of information construction and fusion between different modality inputs, leading to the error accumulation problem. In this paper, we propose a tile classification based viewport prediction method with Multi-modal Fusion Transformer, namely MFTR. Specifically, MFTR utilizes transformer-based networks to extract the long-range dependencies within each modality, then mine intra- and inter-modality relations to capture the combined impact of user historical inputs and video contents on future viewport selection. In addition, MFTR categorizes future tiles into two categories: user interested or not, and selects future viewport as the region that contains most user interested tiles. Comparing with predicting head trajectories, choosing future viewport based on tile's binary classification results exhibits better robustness and interpretability. To evaluate our proposed MFTR, we conduct extensive experiments on two widely used PVS-HM and Xu-Gaze dataset. MFTR shows superior performance over state-of-the-art methods in terms of average prediction accuracy and overlap ratio, also presents competitive computation efficiency.Comment: This paper is accepted by ACM-MM 202

    Network and Content Intelligence for 360 Degree Video Streaming Optimization

    Get PDF
    In recent years, 360° videos, a.k.a. spherical frames, became popular among users creating an immersive streaming experience. Along with the advances in smart- phones and Head Mounted Devices (HMD) technology, many content providers have facilitated to host and stream 360° videos in both on-demand and live stream- ing modes. Therefore, many different applications have already arisen leveraging these immersive videos, especially to give viewers an impression of presence in a digital environment. For example, with 360° videos, now it is possible to connect people in a remote meeting in an interactive way which essentially increases the productivity of the meeting. Also, creating interactive learning materials using 360° videos for students will help deliver the learning outcomes effectively. However, streaming 360° videos is not an easy task due to several reasons. First, 360° video frames are 4–6 times larger than normal video frames to achieve the same quality as a normal video. Therefore, delivering these videos demands higher bandwidth in the network. Second, processing relatively larger frames requires more computational resources at the end devices, particularly for end user devices with limited resources. This will impact not only the delivery of 360° videos but also many other applications running on shared resources. Third, these videos need to be streamed with very low latency requirements due their interactive nature. Inability to satisfy these requirements can result in poor Quality of Experience (QoE) for the user. For example, insufficient bandwidth incurs frequent rebuffer- ing and poor video quality. Also, inadequate computational capacity can cause faster battery draining and unnecessary heating of the device, causing discomfort to the user. Motion or cyber–sickness to the user will be prevalent if there is an unnecessary delay in streaming. These circumstances will hinder providing im- mersive streaming experiences to the much-needed communities, especially those who do not have enough network resources. To address the above challenges, we believe that enhancements to the three main components in video streaming pipeline, server, network and client, are essential. Starting from network, it is beneficial for network providers to identify 360° video flows as early as possible and understand their behaviour in the network to effec- tively allocate sufficient resources for this video delivery without compromising the quality of other services. Content servers, at one end of this streaming pipeline, re- quire efficient 360° video frame processing mechanisms to support adaptive video streaming mechanisms such as ABR (Adaptive Bit Rate) based streaming, VP aware streaming, a streaming paradigm unique to 360° videos that select only part of the larger video frame that fall within the user-visible region, etc. On the other end, the client can be combined with edge-assisted streaming to deliver 360° video content with reduced latency and higher quality. Following the above optimization strategies, in this thesis, first, we propose a mech- anism named 360NorVic to extract 360° video flows from encrypted video traffic and analyze their traffic characteristics. We propose Machine Learning (ML) mod- els to classify 360° and normal videos under different scenarios such as offline, near real-time, VP-aware streaming and Mobile Network Operator (MNO) level stream- ing. Having extracted 360° video traffic traces both in packet and flow level data at higher accuracy, we analyze and understand the differences between 360° and normal video patterns in the encrypted traffic domain that is beneficial for effec- tive resource optimization for enhancing 360° video delivery. Second, we present a WGAN (Wesserstien Generative Adversarial Network) based data generation mechanism (namely VideoTrain++) to synthesize encrypted network video traffic, taking minimal data. Leveraging synthetic data, we show improved performance in 360° video traffic analysis, especially in ML-based classification in 360NorVic. Thirdly, we propose an effective 360° video frame partitioning mechanism (namely VASTile) at the server side to support VP-aware 360° video streaming with dy- namic tiles (or variable tiles) of different sizes and locations on the frame. VASTile takes a visual attention map on the video frames as the input and applies a com- putational geometric approach to generate a non-overlapping tile configuration to cover the video frames adaptive to the visual attention. We present VASTile as a scalable approach for video frame processing at the servers and a method to re- duce bandwidth consumption in network data transmission. Finally, by applying VASTile to the individual user VP at the client side and utilizing cache storage of Multi Access Edge Computing (MEC) servers, we propose OpCASH, a mech- anism to personalize the 360° video streaming with dynamic tiles with the edge assistance. While proposing an ILP based solution to effectively select cached variable tiles from MEC servers that might not be identical to the requested VP tiles by user, but still effectively cover the same VP region, OpCASH maximize the cache utilization and reduce the number of requests to the content servers in congested core network. With this approach, we demonstrate the gain in latency and bandwidth saving and video quality improvement in personalized 360° video streaming

    Machine Learning for Multimedia Communications

    Get PDF
    Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learning-oriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise
    corecore