4 research outputs found

    Subjective Quality Assessment of the Impact of Buffer Size in Fine-Grain Parallel Video Encoding

    Get PDF
    Fine-Grain parallelism is essential for real-time video encoding performance. This usually implies setting a fixed buffer size for each encoded block. The choice of this parameter is critical for both performance and hardware cost. In this paper we analyze the impact of buffer size on image subjective quality, and its relation with other encoding parameters. We explore the consequences on visual quality, when minimizing buffer size to the point of causing the discard of quantized coefficients for highest frequencies. Finally, we propose some guidelines for the choice of buffer size, that has proven to be heavily dependent, in addition to other parameters, on the type of sequence being encoded. These guidelines are useful for the design of efficient realtime encoders, both hardware and software

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

    Methods for Light Field Display Profiling and Scalable Super-Multiview Video Coding

    Get PDF
    Light field 3D displays reproduce the light field of real or synthetic scenes, as observed by multiple viewers, without the necessity of wearing 3D glasses. Reproducing light fields is a technically challenging task in terms of optical setup, content creation, distributed rendering, among others; however, the impressive visual quality of hologramlike scenes, in full color, with real-time frame rates, and over a very wide field of view justifies the complexity involved. Seeing objects popping far out from the screen plane without glasses impresses even those viewers who have experienced other 3D displays before.Content for these displays can either be synthetic or real. The creation of synthetic (rendered) content is relatively well understood and used in practice. Depending on the technique used, rendering has its own complexities, quite similar to the complexity of rendering techniques for 2D displays. While rendering can be used in many use-cases, the holy grail of all 3D display technologies is to become the future 3DTVs, ending up in each living room and showing realistic 3D content without glasses. Capturing, transmitting, and rendering live scenes as light fields is extremely challenging, and it is necessary if we are about to experience light field 3D television showing real people and natural scenes, or realistic 3D video conferencing with real eye-contact.In order to provide the required realism, light field displays aim to provide a wide field of view (up to 180°), while reproducing up to ~80 MPixels nowadays. Building gigapixel light field displays is realistic in the next few years. Likewise, capturing live light fields involves using many synchronized cameras that cover the same display wide field of view and provide the same high pixel count. Therefore, light field capture and content creation has to be well optimized with respect to the targeted display technologies. Two major challenges in this process are addressed in this dissertation.The first challenge is how to characterize the display in terms of its capabilities to create light fields, that is how to profile the display in question. In clearer terms this boils down to finding the equivalent spatial resolution, which is similar to the screen resolution of 2D displays, and angular resolution, which describes the smallest angle, the color of which the display can control individually. Light field is formalized as 4D approximation of the plenoptic function in terms of geometrical optics through spatiallylocalized and angularly-directed light rays in the so-called ray space. Plenoptic Sampling Theory provides the required conditions to sample and reconstruct light fields. Subsequently, light field displays can be characterized in the Fourier domain by the effective display bandwidth they support. In the thesis, a methodology for displayspecific light field analysis is proposed. It regards the display as a signal processing channel and analyses it as such in spectral domain. As a result, one is able to derive the display throughput (i.e. the display bandwidth) and, subsequently, the optimal camera configuration to efficiently capture and filter light fields before displaying them.While the geometrical topology of optical light sources in projection-based light field displays can be used to theoretically derive display bandwidth, and its spatial and angular resolution, in many cases this topology is not available to the user. Furthermore, there are many implementation details which cause the display to deviate from its theoretical model. In such cases, profiling light field displays in terms of spatial and angular resolution has to be done by measurements. Measurement methods that involve the display showing specific test patterns, which are then captured by a single static or moving camera, are proposed in the thesis. Determining the effective spatial and angular resolution of a light field display is then based on an automated analysis of the captured images, as they are reproduced by the display, in the frequency domain. The analysis reveals the empirical limits of the display in terms of pass-band both in the spatial and angular dimension. Furthermore, the spatial resolution measurements are validated by subjective tests confirming that the results are in line with the smallest features human observers can perceive on the same display. The resolution values obtained can be used to design the optimal capture setup for the display in question.The second challenge is related with the massive number of views and pixels captured that have to be transmitted to the display. It clearly requires effective and efficient compression techniques to fit in the bandwidth available, as an uncompressed representation of such a super-multiview video could easily consume ~20 gigabits per second with today’s displays. Due to the high number of light rays to be captured, transmitted and rendered, distributed systems are necessary for both capturing and rendering the light field. During the first attempts to implement real-time light field capturing, transmission and rendering using a brute force approach, limitations became apparent. Still, due to the best possible image quality achievable with dense multi-camera light field capturing and light ray interpolation, this approach was chosen as the basis of further work, despite the massive amount of bandwidth needed. Decompression of all camera images in all rendering nodes, however, is prohibitively time consuming and is not scalable. After analyzing the light field interpolation process and the data-access patterns typical in a distributed light field rendering system, an approach to reduce the amount of data required in the rendering nodes has been proposed. This approach, on the other hand, requires rectangular parts (typically vertical bars in case of a Horizontal Parallax Only light field display) of the captured images to be available in the rendering nodes, which might be exploited to reduce the time spent with decompression of video streams. However, partial decoding is not readily supported by common image / video codecs. In the thesis, approaches aimed at achieving partial decoding are proposed for H.264, HEVC, JPEG and JPEG2000 and the results are compared.The results of the thesis on display profiling facilitate the design of optimal camera setups for capturing scenes to be reproduced on 3D light field displays. The developed super-multiview content encoding also facilitates light field rendering in real-time. This makes live light field transmission and real-time teleconferencing possible in a scalable way, using any number of cameras, and at the spatial and angular resolution the display actually needs for achieving a compelling visual experience

    Novel methods of image compression for 3D reconstruction

    Get PDF
    Data compression techniques are widely used in the transmission and storage of 2D image, video and 3D data structures. The thesis addresses two aspects of data compression: 2D images and 3D structures by focusing research on solving the problem of compressing structured light images for 3D reconstruction. It is useful then to describe the research by separating the compression of 2D images from the compression of 3D data. Concerning image compression, there are many types of techniques and among the most popular are JPEG and JPEG2000. The thesis addresses different types of discrete transformations (DWT, DCT and DST) thatcombined in particular ways followed by Matrix Minimization algorithm,which is achieved high compression ratio by converting groups of data into a single value. This is an essential step to achieve higher compression ratios reaches to 99%. It is demonstrated that the approach is superior to both JPEG and JPEG2000 for compressing 2D images used in 3D reconstruction. The approach has also been tested oncompressing natural or generic 2D images mainly through DCT followed by Matrix Minimization and arithmetic coding.Results show that the method is superior to JPEG in terms of compression ratios and image quality, and equivalent to JPEG2000 in terms of image quality. Concerning the compression of 3D data structures, the Matrix Minimization algorithm is used to compress geometry and connectivity represented by a list of vertices and a list of triangulated faces. It is demonstrated that the method can compress vertices very efficiently compared with other 3D formats. Here the Matrix Minimization algorithm converts each vertex (X, Y and Z) into a single value without the use of any prior discrete transformation (as used in 2D images) and without using any coding algorithm. Concerningconnectivity,the triangulated face data are also compressed with the Matrix Minimizationalgorithm followed by arithmetic coding yielding a stream of compressed data. Results show compression ratiosclose to 95% which are far superior to compression with other 3D techniques. The compression methods presented in this thesis are defined as per-file compression. The methods to generate compression keys depend on the data to be compressed. Thus, each file generates their own set of compression keys and their own set of unique data. This feature enables application in the security domain for safe transmission and storage of data. The generated keys together with the set of unique data can be defined as an encryption key for the file as, without this information, the file cannot be decompressed
    corecore