5,187 research outputs found

    Fast encoding for personalized views extracted from beyond high definition content

    Get PDF
    Broadcast providers are looking for new opportunities to increase user experience and user interaction on their content. Their main goal is to attract and preserve viewer attention to create a big and stable audience. This could be achieved with a second screen application that lets the users select their own viewpoint in an extremely high resolution video to direct their own first screen. By allowing the users to create their own personalized video stream, they become involved with the content creation itself. However, encoding a personalized view for each user is computationally complex. This paper describes a machine learning approach to speed up the encoding of each personal view. Simulation results of zoom, pan and tilt scenarios show bit rate increases between 2% and 9% for complexity reductions between 69% and 79% compared to full encoding

    Machine Learning based Efficient QT-MTT Partitioning Scheme for VVC Intra Encoders

    Full text link
    The next-generation Versatile Video Coding (VVC) standard introduces a new Multi-Type Tree (MTT) block partitioning structure that supports Binary-Tree (BT) and Ternary-Tree (TT) splits in both vertical and horizontal directions. This new approach leads to five possible splits at each block depth and thereby improves the coding efficiency of VVC over that of the preceding High Efficiency Video Coding (HEVC) standard, which only supports Quad-Tree (QT) partitioning with a single split per block depth. However, MTT also has brought a considerable impact on encoder computational complexity. In this paper, a two-stage learning-based technique is proposed to tackle the complexity overhead of MTT in VVC intra encoders. In our scheme, the input block is first processed by a Convolutional Neural Network (CNN) to predict its spatial features through a vector of probabilities describing the partition at each 4x4 edge. Subsequently, a Decision Tree (DT) model leverages this vector of spatial features to predict the most likely splits at each block. Finally, based on this prediction, only the N most likely splits are processed by the Rate-Distortion (RD) process of the encoder. In order to train our CNN and DT models on a wide range of image contents, we also propose a public VVC frame partitioning dataset based on existing image dataset encoded with the VVC reference software encoder. Our proposal relying on the top-3 configuration reaches 46.6% complexity reduction for a negligible bitrate increase of 0.86%. A top-2 configuration enables a higher complexity reduction of 69.8% for 2.57% bitrate loss. These results emphasis a better trade-off between VTM intra coding efficiency and complexity reduction compared to the state-of-the-art solutions

    End to end Multi-Objective Optimisation of H.264 and HEVC Codecs

    Get PDF
    All multimedia devices now incorporate video CODECs that comply with international video coding standards such as H.264 / MPEG4-AVC and the new High Efficiency Video Coding Standard (HEVC) otherwise known as H.265. Although the standard CODECs have been designed to include algorithms with optimal efficiency, large number of coding parameters can be used to fine tune their operation, within known constraints of for e.g., available computational power, bandwidth, consumer QoS requirements, etc. With large number of such parameters involved, determining which parameters will play a significant role in providing optimal quality of service within given constraints is a further challenge that needs to be met. Further how to select the values of the significant parameters so that the CODEC performs optimally under the given constraints is a further important question to be answered. This thesis proposes a framework that uses machine learning algorithms to model the performance of a video CODEC based on the significant coding parameters. Means of modelling both the Encoder and Decoder performance is proposed. We define objective functions that can be used to model the performance related properties of a CODEC, i.e., video quality, bit-rate and CPU time. We show that these objective functions can be practically utilised in video Encoder/Decoder designs, in particular in their performance optimisation within given operational and practical constraints. A Multi-objective Optimisation framework based on Genetic Algorithms is thus proposed to optimise the performance of a video codec. The framework is designed to jointly minimize the CPU Time, Bit-rate and to maximize the quality of the compressed video stream. The thesis presents the use of this framework in the performance modelling and multi-objective optimisation of the most widely used video coding standard in practice at present, H.264 and the latest video coding standard, H.265/HEVC. When a communication network is used to transmit video, performance related parameters of the communication channel will impact the end-to-end performance of the video CODEC. Network delays and packet loss will impact the quality of the video that is received at the decoder via the communication channel, i.e., even if a video CODEC is optimally configured network conditions will make the experience sub-optimal. Given the above the thesis proposes a design, integration and testing of a novel approach to simulating a wired network and the use of UDP protocol for the transmission of video data. This network is subsequently used to simulate the impact of packet loss and network delays on optimally coded video based on the framework previously proposed for the modelling and optimisation of video CODECs. The quality of received video under different levels of packet loss and network delay is simulated, concluding the impact on transmitted video based on their content and features

    CTU Depth Decision Algorithms for HEVC: A Survey

    Get PDF
    High-Efficiency Video Coding (HEVC) surpasses its predecessors in encoding efficiency by introducing new coding tools at the cost of an increased encoding time-complexity. The Coding Tree Unit (CTU) is the main building block used in HEVC. In the HEVC standard, frames are divided into CTUs with the predetermined size of up to 64x64 pixels. Each CTU is then divided recursively into a number of equally sized square areas, known as Coding Units (CUs). Although this diversity of frame partitioning increases encoding efficiency, it also causes an increase in the time complexity due to the increased number of ways to find the optimal partitioning. To address this complexity, numerous algorithms have been proposed to eliminate unnecessary searches during partitioning CTUs by exploiting the correlation in the video. In this paper, existing CTU depth decision algorithms for HEVC are surveyed. These algorithms are categorized into two groups, namely statistics and machine learning approaches. Statistics approaches are further subdivided into neighboring and inherent approaches. Neighboring approaches exploit the similarity between adjacent CTUs to limit the depth range of the current CTU, while inherent approaches use only the available information within the current CTU. Machine learning approaches try to extract and exploit similarities implicitly. Traditional methods like support vector machines or random forests use manually selected features, while recently proposed deep learning methods extract features during training. Finally, this paper discusses extending these methods to more recent video coding formats such as Versatile Video Coding (VVC) and AOMedia Video 1(AV1)

    Perceptual grouping by proximity and orientation bias: experimental and modelling investigations

    Get PDF
    Grouping by proximity is the principle of perceptual organization by which the elements of a visual scene which are closer in space tend to be perceived as a coherent ensemble. Research into this topic makes substantial use of the class of stimuli known as dot lattices. The Pure Distance Law (Kubovy et al., 1998) predicts that the probability of grouping by proximity in these stimuli only depends on the relative inter-dot distance between competing organizations. Despite much effort to explain how grouping by proximity is shaped by the basic organization of visual stimuli, its neural mechanisms are still under debate. Moreover, previous studies reported that grouping in dot lattices also occurs according to an orientation bias, by which these stimuli are perceived along a preferred orientation (vertical), regardless of what predicted by the Pure Distance Law. The aim of this thesis is to shed light on the functional and neural mechanisms characterizing grouping by proximity in dot lattices, as well as the trade-off between proximity- and orientation-based grouping. Study 1 investigates the role of high-level visual working memory (VWM) in promoting for the shift between grouping by proximity and orientation bias. Both the quantity (load) and the quality (content) of the information stored in VWM shape online grouping for dot lattices. Study 2 presents a neural network model simulating the dynamics occurring between low- and high-level processing stages during dot lattices perception. The degree of synchrony between the units at low-level module has a key role in accounting for grouping by proximity. Overall, our results show that high-level (Study 1) and low-level (Study 2) operations contribute in parallel to the emergence of grouping by proximity, as well as to its reciprocity with orientation-based grouping

    Towards one video encoder per individual : guided High Efficiency Video Coding

    Get PDF

    Opportunities and challenges in new survey data collection methods using apps and images.

    Get PDF
    Surveys are well established as an effective way of collecting social science data. However, they may lack the detail, or not measure the concepts, necessary to answer a wide array of social science questions. Supplementing survey data with data from other sources offer opportunities to overcome this. The use of mobile technologies offers many such new opportunities for data collection. New types of data might be able to be collected, or it may be possible to collect existing data types in new and innovative ways .As well as these new opportunities, there are new challenges. Again, these can both be unique to mobile data collection, or existing data collection challenges that are altered by using mobile devices to collect the data.The data used is from a study that makes use of an app for mobile devices to collect data about household spending, the Understanding Society Spending Study One. Participants were asked to report their spending by submitting a photo of a receipt, entering information about a purchase manually, or reporting that they had not spent anything that day. Each substantive chapter offers a piece of research exploring a different challenge posed by this particular research context. Chapter one explores the challenge presented by respondent burden in the context of mobile data collection. Chapter two considers the challenge of device effects. Chapter three examines the challenge of coding large volumes of organic data. The thesis concludes by reflecting on how the lessons learnt throughout might inform survey practice moving forward. Whilst this research focuses on one particular application it is hoped that this serves as a microcosm for contributing to the discussion of the wider opportunities and challenges faced by survey research as a field moving forward

    Multimedia

    Get PDF
    The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications
    • …
    corecore