756 research outputs found
Real-Time Neural Video Recovery and Enhancement on Mobile Devices
As mobile devices become increasingly popular for video streaming, it's
crucial to optimize the streaming experience for these devices. Although deep
learning-based video enhancement techniques are gaining attention, most of them
cannot support real-time enhancement on mobile devices. Additionally, many of
these techniques are focused solely on super-resolution and cannot handle
partial or complete loss or corruption of video frames, which is common on the
Internet and wireless networks.
To overcome these challenges, we present a novel approach in this paper. Our
approach consists of (i) a novel video frame recovery scheme, (ii) a new
super-resolution algorithm, and (iii) a receiver enhancement-aware video bit
rate adaptation algorithm. We have implemented our approach on an iPhone 12,
and it can support 30 frames per second (FPS). We have evaluated our approach
in various networks such as WiFi, 3G, 4G, and 5G networks. Our evaluation shows
that our approach enables real-time enhancement and results in a significant
increase in video QoE (Quality of Experience) of 24\% - 82\% in our video
streaming system
Machine Learning for Multimedia Communications
Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learningoriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise
Machine Learning for Multimedia Communications
Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learning-oriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise
Delivery of 360° videos in edge caching assisted wireless cellular networks
In recent years, 360° videos have become increasingly popular on commercial social platforms, and are a vital part of emerging Virtual Reality (VR) applications. However, the delivery of 360° videos requires significant bandwidth resources, which makes streaming of such data on mobile networks challenging. The bandwidth required for delivering 360° videos can be reduced by exploiting the fact that users are interested in viewing only a part of the video scene, the requested viewport. As different users may request different viewports, some parts of the 360° scenes may be more popular than others. 360° video delivery on mobile networks can be facilitated by caching popular content at edge servers, and delivering it from there to the users. However, existing edge caching schemes do not take full potential of the unequal popularity of different parts of a video, which renders them inefficient for caching 360° videos. Inspired by the above, in this thesis, we investigate how advanced 360° video coding tools, i.e., encoding into multiple quality layers and tiles, can be utilized to build more efficient wireless edge caching schemes for 360° videos. The above encoding allows the caching of only the parts of the 360° videos that are popular in high quality. To understand how edge caching schemes can benefit from 360° video coding, we compare the caching of 360° videos encoded into multiple quality layers and tiles with layer-agnostic and tile-agnostic schemes. To cope with the fact that the content popularity distribution may be unknown, we use machine learning techniques, for both Video on Demand (VoD), and live streaming scenarios. From our findings, it is clear that by taking into account the aforementioned 360° video characteristics leads to an increased performance in terms of the quality of the video delivered to the users, and the usage of the backhaul links
Scalable Multiuser Immersive Communications with Multi-numerology and Mini-slot
This paper studies multiuser immersive communications networks in which
different user equipment may demand various extended reality (XR) services. In
such heterogeneous networks, time-frequency resource allocation needs to be
more adaptive since XR services are usually multi-modal and latency-sensitive.
To this end, we develop a scalable time-frequency resource allocation method
based on multi-numerology and mini-slot. To appropriately determining the
discrete parameters of multi-numerology and mini-slot for multiuser immersive
communications, the proposed method first presents a novel flexible
time-frequency resource block configuration, then it leverages the deep
reinforcement learning to maximize the total quality-of-experience (QoE) under
different users' QoE constraints. The results confirm the efficiency and
scalability of the proposed time-frequency resource allocation method
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity
- …