70 research outputs found

    Visual Saliency Estimation Via HEVC Bitstream Analysis

    Get PDF
    Abstract Since Information Technology developed dramatically from the last century 50's, digital images and video are ubiquitous. In the last decade, image and video processing have become more and more popular in biomedical, industrial, art and other fields. People made progress in the visual information such as images or video display, storage and transmission. The attendant problem is that video processing tasks in time domain become particularly arduous. Based on the study of the existing compressed domain video saliency detection model, a new saliency estimation model for video based on High Efficiency Video Coding (HEVC) is presented. First, the relative features are extracted from HEVC encoded bitstream. The naive Bayesian model is used to train and test features based on original YUV videos and ground truth. The intra frame saliency map can be achieved after training and testing intra features. And inter frame saliency can be achieved by intra saliency with moving motion vectors. The ROC of our proposed intra mode is 0.9561. Other classification methods such as support vector machine (SVM), k nearest neighbors (KNN) and the decision tree are presented to compare the experimental outcomes. The variety of compression ratio has been analysis to affect the saliency

    Prediction of Visual Behaviour in Immersive Contents

    Get PDF
    In the world of broadcasting and streaming, multi-view video provides the ability to present multiple perspectives of the same video sequence, therefore providing to the viewer a sense of immersion in the real-world scene. It can be compared to VR and 360° video, still, there are significant differences, notably in the way that images are acquired: instead of placing the user at the center, presenting the scene around the user in a 360° circle, it uses multiple cameras placed in a 360° circle around the real-world scene of interest, capturing all of the possible perspectives of that scene. Additionally, in relation to VR, it uses natural video sequences and displays. One issue which plagues content streaming of all kinds is the bandwidth requirement which, particularly on VR and multi-view applications, translates into an increase of the required data transmission rate. A possible solution to lower the required bandwidth, would be to limit the number of views to be streamed fully, focusing on those surrounding the area at which the user is keeping his sight. This is proposed by SmoothMV, a multi-view system that uses a non-intrusive head tracking approach to enhance navigation and Quality of Experience (QoE) of the viewer. This system relies on a novel "Hot&Cold" matrix concept to translate head positioning data into viewing angle selections. The main goal of this dissertation focus on the transformation and storage of the data acquired using SmoothMV into datasets. These will be used as training data for a proposed Neural Network, fully integrated within SmoothMV, with the purpose of predicting the interest points on the screen of the users during the playback of multi-view content. The goal behind this effort is to predict possible viewing interests from the user in the near future and optimize bandwidth usage through buffering of adjacent views which could possibly be requested by the user. After concluding the development of this dataset, work in this dissertation will focus on the formulation of a solution to present generated heatmaps of the most viewed areas per video, previously captured using SmoothMV

    CTU Depth Decision Algorithms for HEVC: A Survey

    Get PDF
    High-Efficiency Video Coding (HEVC) surpasses its predecessors in encoding efficiency by introducing new coding tools at the cost of an increased encoding time-complexity. The Coding Tree Unit (CTU) is the main building block used in HEVC. In the HEVC standard, frames are divided into CTUs with the predetermined size of up to 64x64 pixels. Each CTU is then divided recursively into a number of equally sized square areas, known as Coding Units (CUs). Although this diversity of frame partitioning increases encoding efficiency, it also causes an increase in the time complexity due to the increased number of ways to find the optimal partitioning. To address this complexity, numerous algorithms have been proposed to eliminate unnecessary searches during partitioning CTUs by exploiting the correlation in the video. In this paper, existing CTU depth decision algorithms for HEVC are surveyed. These algorithms are categorized into two groups, namely statistics and machine learning approaches. Statistics approaches are further subdivided into neighboring and inherent approaches. Neighboring approaches exploit the similarity between adjacent CTUs to limit the depth range of the current CTU, while inherent approaches use only the available information within the current CTU. Machine learning approaches try to extract and exploit similarities implicitly. Traditional methods like support vector machines or random forests use manually selected features, while recently proposed deep learning methods extract features during training. Finally, this paper discusses extending these methods to more recent video coding formats such as Versatile Video Coding (VVC) and AOMedia Video 1(AV1)

    Adaptive delivery of immersive 3D multi-view video over the Internet

    Get PDF
    The increase in Internet bandwidth and the developments in 3D video technology have paved the way for the delivery of 3D Multi-View Video (MVV) over the Internet. However, large amounts of data and dynamic network conditions result in frequent network congestion, which may prevent video packets from being delivered on time. As a consequence, the 3D video experience may well be degraded unless content-aware precautionary mechanisms and adaptation methods are deployed. In this work, a novel adaptive MVV streaming method is introduced which addresses the future generation 3D immersive MVV experiences with multi-view displays. When the user experiences network congestion, making it necessary to perform adaptation, the rate-distortion optimum set of views that are pre-determined by the server, are truncated from the delivered MVV streams. In order to maintain high Quality of Experience (QoE) service during the frequent network congestion, the proposed method involves the calculation of low-overhead additional metadata that is delivered to the client. The proposed adaptive 3D MVV streaming solution is tested using the MPEG Dynamic Adaptive Streaming over HTTP (MPEG-DASH) standard. Both extensive objective and subjective evaluations are presented, showing that the proposed method provides significant quality enhancement under the adverse network conditions

    Towards one video encoder per individual : guided High Efficiency Video Coding

    Get PDF

    End to end Multi-Objective Optimisation of H.264 and HEVC Codecs

    Get PDF
    All multimedia devices now incorporate video CODECs that comply with international video coding standards such as H.264 / MPEG4-AVC and the new High Efficiency Video Coding Standard (HEVC) otherwise known as H.265. Although the standard CODECs have been designed to include algorithms with optimal efficiency, large number of coding parameters can be used to fine tune their operation, within known constraints of for e.g., available computational power, bandwidth, consumer QoS requirements, etc. With large number of such parameters involved, determining which parameters will play a significant role in providing optimal quality of service within given constraints is a further challenge that needs to be met. Further how to select the values of the significant parameters so that the CODEC performs optimally under the given constraints is a further important question to be answered. This thesis proposes a framework that uses machine learning algorithms to model the performance of a video CODEC based on the significant coding parameters. Means of modelling both the Encoder and Decoder performance is proposed. We define objective functions that can be used to model the performance related properties of a CODEC, i.e., video quality, bit-rate and CPU time. We show that these objective functions can be practically utilised in video Encoder/Decoder designs, in particular in their performance optimisation within given operational and practical constraints. A Multi-objective Optimisation framework based on Genetic Algorithms is thus proposed to optimise the performance of a video codec. The framework is designed to jointly minimize the CPU Time, Bit-rate and to maximize the quality of the compressed video stream. The thesis presents the use of this framework in the performance modelling and multi-objective optimisation of the most widely used video coding standard in practice at present, H.264 and the latest video coding standard, H.265/HEVC. When a communication network is used to transmit video, performance related parameters of the communication channel will impact the end-to-end performance of the video CODEC. Network delays and packet loss will impact the quality of the video that is received at the decoder via the communication channel, i.e., even if a video CODEC is optimally configured network conditions will make the experience sub-optimal. Given the above the thesis proposes a design, integration and testing of a novel approach to simulating a wired network and the use of UDP protocol for the transmission of video data. This network is subsequently used to simulate the impact of packet loss and network delays on optimally coded video based on the framework previously proposed for the modelling and optimisation of video CODECs. The quality of received video under different levels of packet loss and network delay is simulated, concluding the impact on transmitted video based on their content and features

    Low-complexity scalable and multiview video coding

    Get PDF

    Video compression algorithms for HEVC and beyond

    Get PDF
    PhDDue to the increasing number of new services and devices that allow the creation, distribution and consumption of video content, the amount of video information being transmitted all over the world is constantly growing. Video compression technology is essential to cope with the ever increasing volume of digital video data being distributed in today's networks, as more e cient video compression techniques allow support for higher volumes of video data under the same memory/bandwidth constraints. This is especially relevant with the introduction of new and more immersive video formats associated with signi cantly higher amounts of data. In this thesis, novel techniques for improving the e ciency of current and future video coding technologies are investigated. Several aspects that in uence the way conventional video coding methods work are considered. In particular, the properties and limitations of the Human Visual System are exploited to tune the performance of video encoders towards better subjective quality. Additionally, it is shown how the visibility of speci c types of visual artefacts can be prevented during the video encoding process, in order to avoid subjective quality degradations in the compressed content. Techniques for higher video compression e ciency are also explored, targeting to improve the compression capabilities of state-of-the-art video coding standards. Finally, the application of video coding technologies to practical use-cases is considered. Accurate estimation models are devised to control the encoding time and bit rate associated with compressed video signals, in order to meet speci c encoding time and transmission time restrictions

    The significance of image compression in plant phenotyping applications

    Get PDF
    We are currently witnessing an increasingly higher throughput in image-based plant phenotyping experiments. The majority of imaging data are collected using complex automated procedures and are then post-processed to extract phenotyping-related information. In this article, we show that the image compression used in such procedures may compromise phenotyping results and this needs to be taken into account. We use three illuminating proof-of-concept experiments that demonstrate that compression (especially in the most common lossy JPEG form) affects measurements of plant traits and the errors introduced can be high. We also systematically explore how compression affects measurement fidelity, quantified as effects on image quality, as well as errors in extracted plant visual traits. To do so, we evaluate a variety of image-based phenotyping scenarios, including size and colour of shoots, leaf and root growth. To show that even visual impressions can be used to assess compression effects, we use root system images as examples. Overall, we find that compression has a considerable effect on several types of analyses (albeit visual or quantitative) and that proper care is necessary to ensure that this choice does not affect biological findings. In order to avoid or at least minimise introduced measurement errors, for each scenario, we derive recommendations and provide guidelines on how to identify suitable compression options in practice. We also find that certain compression choices can offer beneficial returns in terms of reducing the amount of data storage without compromising phenotyping results. This may enable even higher throughput experiments in the future
    corecore