276 research outputs found
Bridge the Gap Between VQA and Human Behavior on Omnidirectional Video: A Large-Scale Dataset and a Deep Learning Model
Omnidirectional video enables spherical stimuli with the viewing range. Meanwhile, only the viewport region of omnidirectional
video can be seen by the observer through head movement (HM), and an even
smaller region within the viewport can be clearly perceived through eye
movement (EM). Thus, the subjective quality of omnidirectional video may be
correlated with HM and EM of human behavior. To fill in the gap between
subjective quality and human behavior, this paper proposes a large-scale visual
quality assessment (VQA) dataset of omnidirectional video, called VQA-OV, which
collects 60 reference sequences and 540 impaired sequences. Our VQA-OV dataset
provides not only the subjective quality scores of sequences but also the HM
and EM data of subjects. By mining our dataset, we find that the subjective
quality of omnidirectional video is indeed related to HM and EM. Hence, we
develop a deep learning model, which embeds HM and EM, for objective VQA on
omnidirectional video. Experimental results show that our model significantly
improves the state-of-the-art performance of VQA on omnidirectional video.Comment: Accepted by ACM MM 201
Steered mixture-of-experts for light field images and video : representation and coding
Research in light field (LF) processing has heavily increased over the last decade. This is largely driven by the desire to achieve the same level of immersion and navigational freedom for camera-captured scenes as it is currently available for CGI content. Standardization organizations such as MPEG and JPEG continue to follow conventional coding paradigms in which viewpoints are discretely represented on 2-D regular grids. These grids are then further decorrelated through hybrid DPCM/transform techniques. However, these 2-D regular grids are less suited for high-dimensional data, such as LFs. We propose a novel coding framework for higher-dimensional image modalities, called Steered Mixture-of-Experts (SMoE). Coherent areas in the higher-dimensional space are represented by single higher-dimensional entities, called kernels. These kernels hold spatially localized information about light rays at any angle arriving at a certain region. The global model consists thus of a set of kernels which define a continuous approximation of the underlying plenoptic function. We introduce the theory of SMoE and illustrate its application for 2-D images, 4-D LF images, and 5-D LF video. We also propose an efficient coding strategy to convert the model parameters into a bitstream. Even without provisions for high-frequency information, the proposed method performs comparable to the state of the art for low-to-mid range bitrates with respect to subjective visual quality of 4-D LF images. In case of 5-D LF video, we observe superior decorrelation and coding performance with coding gains of a factor of 4x in bitrate for the same quality. At least equally important is the fact that our method inherently has desired functionality for LF rendering which is lacking in other state-of-the-art techniques: (1) full zero-delay random access, (2) light-weight pixel-parallel view reconstruction, and (3) intrinsic view interpolation and super-resolution
Context-Aware Adaptive Prefetching for DASH Streaming over 5G Networks
The increasing consumption of video streams and the demand for higher-quality
content drive the evolution of telecommunication networks and the development
of new network accelerators to boost media delivery while optimizing network
usage. Multi-access Edge Computing (MEC) enables the possibility to enforce
media delivery by deploying caching instances at the network edge, close to the
Radio Access Network (RAN). Thus, the content can be prefetched and served from
the MEC host, reducing network traffic and increasing the Quality of Service
(QoS) and the Quality of Experience (QoE). This paper proposes a novel
mechanism to prefetch Dynamic Adaptive Streaming over HTTP (DASH) streams at
the MEC, employing a Machine Learning (ML) classification model to select the
media segments to prefetch. The model is trained with media session metrics to
improve the forecasts with application layer information. The proposal is
tested with Mobile Network Operators (MNOs)' 5G MEC and RAN and compared with
other strategies by assessing cache and player's performance metrics
Adaptive delivery of immersive 3D multi-view video over the Internet
The increase in Internet bandwidth and the developments in 3D video technology have paved the way for the delivery of 3D Multi-View Video (MVV) over the Internet. However, large amounts of data and dynamic network conditions result in frequent network congestion, which may prevent video packets from being delivered on time. As a consequence, the 3D video experience may well be degraded unless content-aware precautionary mechanisms and adaptation methods are deployed. In this work, a novel adaptive MVV streaming method is introduced which addresses the future generation 3D immersive MVV experiences with multi-view displays. When the user experiences network congestion, making it necessary to perform adaptation, the rate-distortion optimum set of views that are pre-determined by the server, are truncated from the delivered MVV streams. In order to maintain high Quality of Experience (QoE) service during the frequent network congestion, the proposed method involves the calculation of low-overhead additional metadata that is delivered to the client. The proposed adaptive 3D MVV streaming solution is tested using the MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH) standard. Both extensive objective and subjective evaluations are presented, showing that the proposed method provides significant quality enhancement under the adverse network conditions
Simulation Framework for Evaluating Video Delivery Services over Vehicular Networks
Vehicular Ad-hoc Networks contribute to the Intelligent
Transportation Systems by providing a set of services related
to traffic, mobility, safe driving, and infotainment applications.
One of the most challenging applications is video delivery, since
it has to deal with several hurdles typically found in wireless
communications, like high node mobility, bandwidth limitations
and high loss rates. In this work, we propose an integrated
simulation framework that will provide a multilayer view of
a particular video delivery session with a bunch of simulation
results at physical (i.e., collisions), MAC (i.e., packet delay),
application (i.e.,%of lost frames), and user levels (i.e., perceptual
video quality). With this tool, we can analyze the performance
of video streaming over vehicular networks with a high level
of detail, giving us the keys to better understand and, as a
consequence, improve video delivery services
Optimized Data Representation for Interactive Multiview Navigation
In contrary to traditional media streaming services where a unique media
content is delivered to different users, interactive multiview navigation
applications enable users to choose their own viewpoints and freely navigate in
a 3-D scene. The interactivity brings new challenges in addition to the
classical rate-distortion trade-off, which considers only the compression
performance and viewing quality. On the one hand, interactivity necessitates
sufficient viewpoints for richer navigation; on the other hand, it requires to
provide low bandwidth and delay costs for smooth navigation during view
transitions. In this paper, we formally describe the novel trade-offs posed by
the navigation interactivity and classical rate-distortion criterion. Based on
an original formulation, we look for the optimal design of the data
representation by introducing novel rate and distortion models and practical
solving algorithms. Experiments show that the proposed data representation
method outperforms the baseline solution by providing lower resource
consumptions and higher visual quality in all navigation configurations, which
certainly confirms the potential of the proposed data representation in
practical interactive navigation systems
A parallel H.264/SVC encoder for high definition video conferencing
In this paper we present a video encoder specially developed and configured for high definition (HD) video conferencing. This video encoder brings together the following three requirements: H.264/Scalable Video Coding (SVC), parallel encoding on multicore platforms, and parallel-friendly rate control. With the first requirement, a minimum quality of service to every end-user receiver over Internet Protocol networks is guaranteed. With the second one, real-time execution is accomplished and, for this purpose, slice-level parallelism, for the main encoding loop, and block-level parallelism, for the upsampling and interpolation filtering processes, are combined. With the third one, a proper HD video content delivery under certain bit rate and end-to-end delay constraints is ensured. The experimental results prove that the proposed H.264/SVC video encoder is able to operate in real time over a wide range of target bit rates at the expense of reasonable losses in rate-distortion efficiency due to the frame partitioning into slices
Engineering a Live UHD Program from the International Space Station
The first-ever live downlink of Ultra-High Definition (UHD) video from the International Space Station (ISS) was the highlight of a Super Session at the National Association of Broadcasters (NAB) Show in April 2017. Ultra-High Definition is four times the resolution of full HD or 1080P video. Also referred to as 4K, the Ultra-High Definition video downlink from the ISS all the way to the Las Vegas Convention Center required considerable planning, pushed the limits of conventional video distribution from a space-craft, and was the first use of High Efficiency Video Coding (HEVC) from a space-craft. The live event at NAB will serve as a pathfinder for more routine downlinks of UHD as well as use of HEVC for conventional HD downlinks to save bandwidth. A similar demonstration was conducted in 2006 with the Discovery Channel to demonstrate the ability to stream HDTV from the ISS. This paper will describe the overall work flow and routing of the UHD video, how audio was synchronized even though the video and audio were received many seconds apart from each other, and how the demonstration paves the way for not only more efficient video distribution from the ISS, but also serves as a pathfinder for more complex video distribution from deep space. The paper will also describe how a live event was staged when the UHD video coming from the ISS had a latency of 10+ seconds. In addition, the paper will touch on the unique collaboration between the inherently governmental aspects of the ISS, commercial partners Amazon and Elemental, and the National Association of Broadcasters
- âŠ