112 research outputs found
Machine Learning for Multimedia Communications
Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learningoriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise
Immersive interconnected virtual and augmented reality : a 5G and IoT perspective
Despite remarkable advances, current augmented and virtual reality (AR/VR) applications are a largely individual and local experience. Interconnected AR/VR, where participants can virtually interact across vast distances, remains a distant dream. The great barrier that stands between current technology and such applications is the stringent end-to-end latency requirement, which should not exceed 20 ms in order to avoid motion sickness and other discomforts. Bringing AR/VR to the next level to enable immersive interconnected AR/VR will require significant advances towards 5G ultra-reliable low-latency communication (URLLC) and a Tactile Internet of Things (IoT). In this article, we articulate the technical challenges to enable a future AR/VR end-to-end architecture, that combines 5G URLLC and Tactile IoT technology to support this next generation of interconnected AR/VR applications. Through the use of IoT sensors and actuators, AR/VR applications will be aware of the environmental and user context, supporting human-centric adaptations of the application logic, and lifelike interactions with the virtual environment. We present potential use cases and the required technological building blocks. For each of them, we delve into the current state of the art and challenges that need to be addressed before the dream of remote AR/VR interaction can become reality
Streaming and User Behaviour in Omnidirectional Videos
Omnidirectional videos (ODVs) have gone beyond the passive paradigm of traditional video,
offering higher degrees of immersion and interaction. The revolutionary novelty of this technology is the possibility for users to interact with the surrounding environment, and to feel a
sense of engagement and presence in a virtual space. Users are clearly the main driving force of
immersive applications and consequentially the services need to be properly tailored to them.
In this context, this chapter highlights the importance of the new role of users in ODV streaming applications, and thus the need for understanding their behaviour while navigating within
ODVs. A comprehensive overview of the research efforts aimed at advancing ODV streaming
systems is also presented. In particular, the state-of-the-art solutions under examination in this
chapter are distinguished in terms of system-centric and user-centric streaming approaches: the
former approach comes from a quite straightforward extension of well-established solutions for
the 2D video pipeline while the latter one takes the benefit of understanding users’ behaviour
and enable more personalised ODV streaming
Generative AI-enabled Mobile Tactical Multimedia Networks: Distribution, Generation, and Perception
Mobile multimedia networks (MMNs) demonstrate great potential in delivering
low-latency and high-quality entertainment and tactical applications, such as
short-video sharing, online conferencing, and battlefield surveillance. For
instance, in tactical surveillance of battlefields, scalability and
sustainability are indispensable for maintaining large-scale military
multimedia applications in MMNs. Therefore, many data-driven networking
solutions are leveraged to optimize streaming strategies based on real-time
traffic analysis and resource monitoring. In addition, generative AI (GAI) can
not only increase the efficiency of existing data-driven solutions through data
augmentation but also develop potential capabilities for MMNs, including
AI-generated content (AIGC) and AI-aided perception. In this article, we
propose the framework of GAI-enabled MMNs that leverage the capabilities of GAI
in data and content synthesis to distribute high-quality and immersive
interactive content in wireless networks. Specifically, we outline the
framework of GAI-enabled MMNs and then introduce its three main features,
including distribution, generation, and perception. Furthermore, we propose a
second-score auction mechanism for allocating network resources by considering
GAI model values and other metrics jointly. The experimental results show that
the proposed auction mechanism can effectively increase social welfare by
allocating resources and models with the highest user satisfaction
Machine Learning for Multimedia Communications
Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learning-oriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise
QoE management of multimedia streaming services in future networks : a tutorial and survey
No embargo require
Neural Video Recovery for Cloud Gaming
Cloud gaming is a multi-billion dollar industry. A client in cloud gaming
sends its movement to the game server on the Internet, which renders and
transmits the resulting video back. In order to provide a good gaming
experience, a latency below 80 ms is required. This means that video rendering,
encoding, transmission, decoding, and display have to finish within that time
frame, which is especially challenging to achieve due to server overload,
network congestion, and losses. In this paper, we propose a new method for
recovering lost or corrupted video frames in cloud gaming. Unlike traditional
video frame recovery, our approach uses game states to significantly enhance
recovery accuracy and utilizes partially decoded frames to recover lost
portions. We develop a holistic system that consists of (i) efficiently
extracting game states, (ii) modifying H.264 video decoder to generate a mask
to indicate which portions of video frames need recovery, and (iii) designing a
novel neural network to recover either complete or partial video frames. Our
approach is extensively evaluated using iPhone 12 and laptop implementations,
and we demonstrate the utility of game states in the game video recovery and
the effectiveness of our overall design
Delivery of 360° videos in edge caching assisted wireless cellular networks
In recent years, 360° videos have become increasingly popular on commercial social platforms, and are a vital part of emerging Virtual Reality (VR) applications. However, the delivery of 360° videos requires significant bandwidth resources, which makes streaming of such data on mobile networks challenging. The bandwidth required for delivering 360° videos can be reduced by exploiting the fact that users are interested in viewing only a part of the video scene, the requested viewport. As different users may request different viewports, some parts of the 360° scenes may be more popular than others. 360° video delivery on mobile networks can be facilitated by caching popular content at edge servers, and delivering it from there to the users. However, existing edge caching schemes do not take full potential of the unequal popularity of different parts of a video, which renders them inefficient for caching 360° videos. Inspired by the above, in this thesis, we investigate how advanced 360° video coding tools, i.e., encoding into multiple quality layers and tiles, can be utilized to build more efficient wireless edge caching schemes for 360° videos. The above encoding allows the caching of only the parts of the 360° videos that are popular in high quality. To understand how edge caching schemes can benefit from 360° video coding, we compare the caching of 360° videos encoded into multiple quality layers and tiles with layer-agnostic and tile-agnostic schemes. To cope with the fact that the content popularity distribution may be unknown, we use machine learning techniques, for both Video on Demand (VoD), and live streaming scenarios. From our findings, it is clear that by taking into account the aforementioned 360° video characteristics leads to an increased performance in terms of the quality of the video delivered to the users, and the usage of the backhaul links
- …