250 research outputs found
Rate-Accuracy Trade-Off In Video Classification With Deep Convolutional Neural Networks
Advanced video classification systems decode video frames to derive the
necessary texture and motion representations for ingestion and analysis by
spatio-temporal deep convolutional neural networks (CNNs). However, when
considering visual Internet-of-Things applications, surveillance systems and
semantic crawlers of large video repositories, the video capture and the
CNN-based semantic analysis parts do not tend to be co-located. This
necessitates the transport of compressed video over networks and incurs
significant overhead in bandwidth and energy consumption, thereby significantly
undermining the deployment potential of such systems. In this paper, we
investigate the trade-off between the encoding bitrate and the achievable
accuracy of CNN-based video classification models that directly ingest
AVC/H.264 and HEVC encoded videos. Instead of retaining entire compressed video
bitstreams and applying complex optical flow calculations prior to CNN
processing, we only retain motion vector and select texture information at
significantly-reduced bitrates and apply no additional processing prior to CNN
ingestion. Based on three CNN architectures and two action recognition
datasets, we achieve 11%-94% saving in bitrate with marginal effect on
classification accuracy. A model-based selection between multiple CNNs
increases these savings further, to the point where, if up to 7% loss of
accuracy can be tolerated, video classification can take place with as little
as 3 kbps for the transport of the required compressed video information to the
system implementing the CNN models
Deep Video Precoding
Several groups worldwide are currently investigating how deep learning may advance the state-of-the-art in image and video coding. An open question is how to make deep neural networks work in conjunction with existing (and upcoming) video codecs, such as MPEG H.264/AVC, H.265/HEVC, VVC, Google VP9 and AOMedia AV1, AV2, as well as existing container and transport formats, without imposing any changes at the client side. Such compatibility is a crucial aspect when it comes to practical deployment, especially when considering the fact that the video content industry and hardware manufacturers are expected to remain committed to supporting these standards for the foreseeable future. We propose to use deep neural networks as precoders for current and future video codecs and adaptive video streaming systems. In our current design, the core precoding component comprises a cascaded structure of downscaling neural networks that operates during video encoding, prior to transmission. This is coupled with a precoding mode selection algorithm for each independently-decodable stream segment, which adjusts the downscaling factor according to scene characteristics, the utilized encoder, and the desired bitrate and encoding configuration. Our framework is compatible with all current and future codec and transport standards, as our deep precoding network structure is trained in conjunction with linear upscaling filters (e.g., the bilinear filter), which are supported by all web video players. Extensive evaluation on FHD (1080p) and UHD (2160p) content and with widely-used H.264/AVC, H.265/HEVC and VP9 encoders, as well as a preliminary evaluation with the current test model of VVC (v.6.2rc1), shows that coupling such standards with the proposed deep video precoding allows for 8% to 52% rate reduction under encoding configurations and bitrates suitable for video-on-demand adaptive streaming systems. The use of precoding can also lead to encoding complexity reduction, which is essential for cost-effective cloud deployment of complex encoders like H.265/HEVC, VP9 and VVC, especially when considering the prominence of high-resolution adaptive video streaming
Video QoS/QoE over IEEE802.11n/ac: A Contemporary Survey
The demand for video applications over wireless networks has tremendously increased, and IEEE 802.11 standards have provided higher support for video transmission. However, providing Quality of Service (QoS) and Quality of Experience (QoE) for video over WLAN is still a challenge due to the error sensitivity of compressed video and dynamic channels. This thesis presents a contemporary survey study on video QoS/QoE over WLAN issues and solutions. The objective of the study is to provide an overview of the issues by conducting a background study on the video codecs and their features and characteristics, followed by studying QoS and QoE support in IEEE 802.11 standards. Since IEEE 802.11n is the current standard that is mostly deployed worldwide and IEEE 802.11ac is the upcoming standard, this survey study aims to investigate the most recent video QoS/QoE solutions based on these two standards. The solutions are divided into two broad categories, academic solutions, and vendor solutions. Academic solutions are mostly based on three main layers, namely Application, Media Access Control (MAC) and Physical (PHY) which are further divided into two major categories, single-layer solutions, and cross-layer solutions. Single-layer solutions are those which focus on a single layer to enhance the video transmission performance over WLAN. Cross-layer solutions involve two or more layers to provide a single QoS solution for video over WLAN. This thesis has also presented and technically analyzed QoS solutions by three popular vendors. This thesis concludes that single-layer solutions are not directly related to video QoS/QoE, and cross-layer solutions are performing better than single-layer solutions, but they are much more complicated and not easy to be implemented. Most vendors rely on their network infrastructure to provide QoS for multimedia applications. They have their techniques and mechanisms, but the concept of providing QoS/QoE for video is almost the same because they are using the same standards and rely on Wi-Fi Multimedia (WMM) to provide QoS
Optimized Data Representation for Interactive Multiview Navigation
In contrary to traditional media streaming services where a unique media
content is delivered to different users, interactive multiview navigation
applications enable users to choose their own viewpoints and freely navigate in
a 3-D scene. The interactivity brings new challenges in addition to the
classical rate-distortion trade-off, which considers only the compression
performance and viewing quality. On the one hand, interactivity necessitates
sufficient viewpoints for richer navigation; on the other hand, it requires to
provide low bandwidth and delay costs for smooth navigation during view
transitions. In this paper, we formally describe the novel trade-offs posed by
the navigation interactivity and classical rate-distortion criterion. Based on
an original formulation, we look for the optimal design of the data
representation by introducing novel rate and distortion models and practical
solving algorithms. Experiments show that the proposed data representation
method outperforms the baseline solution by providing lower resource
consumptions and higher visual quality in all navigation configurations, which
certainly confirms the potential of the proposed data representation in
practical interactive navigation systems
Extended Signaling Methods for Reduced Video Decoder Power Consumption Using Green Metadata
In this paper, we discuss one aspect of the latest MPEG standard edition on
energy-efficient media consumption, also known as Green Metadata (ISO/IEC
232001-11), which is the interactive signaling for remote decoder-power
reduction for peer-to-peer video conferencing. In this scenario, the receiver
of a video, e.g., a battery-driven portable device, can send a dedicated
request to the sender which asks for a video bitstream representation that is
less complex to decode and process. Consequently, the receiver saves energy and
extends operating times. We provide an overview on latest studies from the
literature dealing with energy-saving aspects, which motivate the extension of
the legacy Green Metadata standard. Furthermore, we explain the newly
introduced syntax elements and verify their effectiveness by performing
dedicated experiments. We show that the integration of these syntax elements
can lead to dynamic energy savings of up to 90% for software video decoding and
80% for hardware video decoding, respectively.Comment: 5 pages, 2 figure
Analysis and Comparison of Modern Video Compression Standards for Random-access Light-field Compression
Light-field (LF) 3D displays are anticipated to be the next-generation 3D displays by providing smooth motion parallax, wide field of view (FOV), and higher depth range than the current autostereoscopic displays. The projection-based multi-view LF 3D displays bring the desired new functionalities through a set of projection engines creating light sources for the continuous light field to be created. Such displays require a high number of perspective views as an input to fully exploit the visualization capabilities and viewing angle provided by the LF technology. Delivering, processing and de/compressing this amount of views pose big technical challenges. However, when processing light fields in a distributed system, access patterns in ray space are quite regular, some processing nodes do not need all views, moreover the necessary views are used only partially. This trait could be exploited by partial decoding of pictures to help providing less complex and thus real-time operation.
However, none of the recent video coding standards (e.g., Advanced Video Coding (AVC)/H.264 and High Efficiency Video Coding (HEVC)/H.265 standards) provides partial decoding of video pictures. Such feature can be achieved by partitioning video pictures into partitions that can be processed independently at the cost of lowering the compression efficiency. Examples of such partitioning features introduced by the modern video coding standards include slices and tiles, which enable random access into the video bitstreams with a specific granularity. In addition, some extra requirements have to be imposed on the standard partitioning tools in order to be applicable in the context of partial decoding. This leads to partitions called self-contained which refers to isolated or independently decodable regions in the video pictures.
This work studies the problem of creating self-contained partitions in the conventional AVC/H.264 and HEVC/H.265 standards, and HEVC 3D extensions including multi-view (i.e., MV-HEVC) and 3D (i.e., 3D-HEVC) extensions using slices and tiles, respectively. The requirements that need to be fulfilled in order to build self-contained partitions are described, and an encoder-side solution is proposed. Further, the work examines how slicing/tiling can be used to facilitate random access into the video bitstreams, how the number of slices/tiles affects the compression ratio considering different prediction structures, and how much effect partial decoding has on decoding time.
Overall, the experimental results indicate that the finer the partitioning is, the higher the compression loss occurs. The usage of self-contained partitions makes the decoding operation very efficient and less complex
Análise do HEVC escalável : desempenho e controlo de débito
Mestrado em Engenharia EletrĂłnica e TelecomunicaçõesEsta dissertação apresenta um estudo da norma de codificação de vĂdeo de alta eficiĂŞncia (HEVC) e a sua extensĂŁo para vĂdeo escalável, SHVC. A norma de vĂdeo SHVC proporciona um melhor desempenho quando codifica várias camadas em simultâneo do que quando se usa o codificador HEVC numa configuração simulcast. Ambos os codificadores de referĂŞncia, tanto para a camada base como para a camada superior usam o mesmo modelo de controlo de dĂ©bito, modelo R-λ, que foi otimizado para o HEVC. Nenhuma otimização de alocação de dĂ©bito entre camadas foi atĂ© ao momento proposto para o modelo de testes (SHM 8) para a escalabilidade do HEVC (SHVC). Derivamos um novo modelo R-λ apropriado para a camada superior e para o caso de escalabilidade espacial, que conduziu a um ganho de BD-dĂ©bito de 1,81% e de BD-PSNR de 0,025 em relação ao modelo de dĂ©bito-distorção existente no SHM do SHVC. Todavia, mostrou-se tambĂ©m nesta dissertação que o proposto modelo de R-λ nĂŁo deve ser usado na camada inferior (camada base) no SHVC e por conseguinte no HEVC.This dissertation provides a study of the High Efficiency Video Coding standard (HEVC) and its scalable extension, SHVC. The SHVC provides a better performance when encoding several layers simultaneously than using an HEVC encoder in a simulcast configuration. Both reference encoders, in the base layer and in the enhancement layer use the same rate control model, R-λ model, which was optimized for HEVC. No optimal bitrate partitioning amongst layers is proposed in scalable HEVC (SHVC) test model (SHM 8). We derived a new R-λ model for the enhancement layer and for the spatial case which led to a DB-rate gain of 1.81% and DB-PSNR gain of 0.025 in relation to the rate-distortion model of SHM-SHVC. Nevertheless, we also show in this dissertation that the proposed model of R-λ should not be used neither in the base layer nor in HEVC
Streaming and User Behaviour in Omnidirectional Videos
Omnidirectional videos (ODVs) have gone beyond the passive paradigm of traditional video,
offering higher degrees of immersion and interaction. The revolutionary novelty of this technology is the possibility for users to interact with the surrounding environment, and to feel a
sense of engagement and presence in a virtual space. Users are clearly the main driving force of
immersive applications and consequentially the services need to be properly tailored to them.
In this context, this chapter highlights the importance of the new role of users in ODV streaming applications, and thus the need for understanding their behaviour while navigating within
ODVs. A comprehensive overview of the research efforts aimed at advancing ODV streaming
systems is also presented. In particular, the state-of-the-art solutions under examination in this
chapter are distinguished in terms of system-centric and user-centric streaming approaches: the
former approach comes from a quite straightforward extension of well-established solutions for
the 2D video pipeline while the latter one takes the benefit of understanding users’ behaviour
and enable more personalised ODV streaming
- …