    Shape representation and coding of visual objets in multimedia applications — An overview

    Emerging multimedia applications have created the need for new functionalities in digital communications. Whereas existing compression standards only deal with the audio-visual scene at a frame level, it is now necessary to handle individual objects separately, thus allowing scalable transmission as well as interactive scene recomposition by the receiver. The future MPEG-4 standard aims at providing compression tools addressing these functionalities. Unlike existing frame-based standards, the corresponding coding schemes need to encode shape information explicitly. This paper reviews existing solutions to the problem of shape representation and coding. Region and contour coding techniques are presented and their performance is discussed, considering coding efficiency and rate-distortion control capability, as well as flexibility to application requirements such as progressive transmission, low-delay coding, and error robustnes

    Coding of details in very low bit-rate video systems

    In this paper, the importance of including small image features at the initial levels of a progressive second generation video coding scheme is presented. It is shown that a number of meaningful small features called details should be coded, even at very low data bit-rates, in order to match their perceptual significance to the human visual system. We propose a method for extracting, perceptually selecting and coding of visual details in a video sequence using morphological techniques. Its application in the framework of a multiresolution segmentation-based coding algorithm yields better results than pure segmentation techniques at higher compression ratios, if the selection step fits some main subjective requirements. Details are extracted and coded separately from the region structure and included in the reconstructed images in a later stage. The bet of considering the local background of a given detail for its perceptual selection breaks the concept ofPeer ReviewedPostprint (published version

    Prediction error image coding using a modified stochastic vector quantization scheme

    The objective of this paper is to provide an efficient and yet simple method to encode the prediction error image of video sequences, based on a stochastic vector quantization (SVQ) approach that has been modified to cope with the intrinsic decorrelated nature of the prediction error image of video signals. In the SVQ scheme, the codewords are generated by stochastic techniques instead of being generated by a training set representative of the expected input image as is normal use in VQ. The performance of the scheme is shown for the particular case of segmentation-based video coding although the technique can be also applied to motion-compensated hybrid coding schemes.Peer ReviewedPostprint (published version

    Segmentation based coding of depth Information for 3D video

    Increased interest in 3D artifact and the need of transmitting, broadcasting and saving the whole information that represents the 3D view, has been a hot topic in recent years. Knowing that adding the depth information to the views will increase the encoding bitrate considerably, we decided to find a new approach to encode/decode the depth information for 3D video. In this project, different approaches to encode/decode the depth information are experienced and a new method is implemented which its result is compared to the best previously developed method considering both bitrate and quality (PSNR)

    High-performance compression of visual information - A tutorial review - Part I : Still Pictures

    Digital images have become an important source of information in the modern world of communication systems. In their raw form, digital images require a tremendous amount of memory. Many research efforts have been devoted to the problem of image compression in the last two decades. Two different compression categories must be distinguished: lossless and lossy. Lossless compression is achieved if no distortion is introduced in the coded image. Applications requiring this type of compression include medical imaging and satellite photography. For applications such as video telephony or multimedia applications, some loss of information is usually tolerated in exchange for a high compression ratio. In this two-part paper, the major building blocks of image coding schemes are overviewed. Part I covers still image coding, and Part II covers motion picture sequences. In this first part, still image coding schemes have been classified into predictive, block transform, and multiresolution approaches. Predictive methods are suited to lossless and low-compression applications. Transform-based coding schemes achieve higher compression ratios for lossy compression but suffer from blocking artifacts at high-compression ratios. Multiresolution approaches are suited for lossy as well for lossless compression. At lossy high-compression ratios, the typical artifact visible in the reconstructed images is the ringing effect. New applications in a multimedia environment drove the need for new functionalities of the image coding schemes. For that purpose, second-generation coding techniques segment the image into semantically meaningful parts. Therefore, parts of these methods have been adapted to work for arbitrarily shaped regions. In order to add another functionality, such as progressive transmission of the information, specific quantization algorithms must be defined. A final step in the compression scheme is achieved by the codeword assignment. Finally, coding results are presented which compare stateof- the-art techniques for lossy and lossless compression. The different artifacts of each technique are highlighted and discussed. Also, the possibility of progressive transmission is illustrated

    Codage de cartes de profondeur par deformation de courbes elastiques

    In multiple-view video plus depth, depth maps can be represented by means of grayscale images and the corresponding temporal sequence can be thought as a standard grayscale video sequence. However depth maps have different properties from natural images: they present large areas of smooth surfaces separated by sharp edges. Arguably the most important information lies in object contours, as a consequence an interesting approach consists in performing a lossless coding of the contour map, possibly followed by a lossy coding of per-object depth values.In this context, we propose a new technique for the lossless coding of object contours, based on the elastic deformation of curves. A continuous evolution of elastic deformations between two reference contour curves can be modelled, and an elastically deformed version of the reference contours can be sent to the decoder with an extremely small coding cost and used as side information to improve the lossless coding of the actual contour. After the main discontinuities have been captured by the contour description, the depth field inside each region is rather smooth. We proposed and tested two different techniques for the coding of the depth field inside each region. The first technique performs the shape-adaptive wavelet transform followed by the shape-adaptive version of SPIHT. The second technique performs a prediction of the depth field from its subsampled version and the set of coded contours. It is generally recognized that a high quality view rendering at the receiver side is possible only by preserving the contour information, since distortions on edges during the encoding step would cause a sensible degradation on the synthesized view and on the 3D perception. We investigated this claim by conducting a subjective quality assessment test to compare an object-based technique and a hybrid block-based techniques for the coding of depth maps.Dans le format multiple-view video plus depth, les cartes de profondeur peuvent ĂȘtre reprĂ©sentĂ©es comme des images en niveaux de gris et la sĂ©quence temporelle correspondante peut ĂȘtre considĂ©rĂ©e comme une sĂ©quence vidĂ©o standard en niveaux de gris. Cependant les cartes de profondeur ont des propriĂ©tĂ©s diffĂ©rentes des images naturelles: ils prĂ©sentent de grandes surfaces lisses sĂ©parĂ©es par des arĂȘtes vives. On peut dire que l'information la plus importante rĂ©side dans les contours de l'objet, en consĂ©quence une approche intĂ©ressante consiste Ă  effectuer un codage sans perte de la carte de contour, Ă©ventuellement suivie d'un codage lossy des valeurs de profondeur par-objet.Dans ce contexte, nous proposons une nouvelle technique pour le codage sans perte des contours de l'objet, basĂ©e sur la dĂ©formation Ă©lastique des courbes. Une Ă©volution continue des dĂ©formations Ă©lastiques peut ĂȘtre modĂ©lisĂ©e entre deux courbes de rĂ©fĂ©rence, et une version du contour dĂ©formĂ©e Ă©lastiquement peut ĂȘtre envoyĂ© au dĂ©codeur avec un coĂ»t de codage trĂšs faible et utilisĂ© comme information latĂ©rale pour amĂ©liorer le codage sans perte du contour rĂ©el. AprĂšs que les principales discontinuitĂ©s ont Ă©tĂ© capturĂ©s par la description du contour, la profondeur Ă  l'intĂ©rieur de chaque rĂ©gion est assez lisse. Nous avons proposĂ© et testĂ© deux techniques diffĂ©rentes pour le codage du champ de profondeur Ă  l'intĂ©rieur de chaque rĂ©gion. La premiĂšre technique utilise la version adaptative Ă  la forme de la transformation en ondelette, suivie par la version adaptative Ă  la forme de SPIHT.La seconde technique effectue une prĂ©diction du champ de profondeur Ă  partir de sa version sous-Ă©chantillonnĂ©e et l'ensemble des contours codĂ©s. Il est gĂ©nĂ©ralement reconnu qu'un rendu de haute qualitĂ© au rĂ©cepteur pour un nouveau point de vue est possible que avec la prĂ©servation de l'information de contour, car des distorsions sur les bords lors de l'Ă©tape de codage entraĂźnerait une dĂ©gradation Ă©vidente sur la vue synthĂ©tisĂ©e et sur la perception 3D. Nous avons Ă©tudiĂ© cette affirmation en effectuant un test d'Ă©valuation de la qualitĂ© perçue en comparant, pour le codage des cartes de profondeur, une technique basĂ©e sur la compression d'objects et une techniques de codage vidĂ©o hybride Ă  blocs

    Prioritizing Content of Interest in Multimedia Data Compression

    Image and video compression techniques make data transmission and storage in digital multimedia systems more efficient and feasible for the system's limited storage and bandwidth. Many generic image and video compression techniques such as JPEG and H.264/AVC have been standardized and are now widely adopted. Despite their great success, we observe that these standard compression techniques are not the best solution for data compression in special types of multimedia systems such as microscopy videos and low-power wireless broadcast systems. In these application-specific systems where the content of interest in the multimedia data is known and well-defined, we should re-think the design of a data compression pipeline. We hypothesize that by identifying and prioritizing multimedia data's content of interest, new compression methods can be invented that are far more effective than standard techniques. In this dissertation, a set of new data compression methods based on the idea of prioritizing the content of interest has been proposed for three different kinds of multimedia systems. I will show that the key to designing efficient compression techniques in these three cases is to prioritize the content of interest in the data. The definition of the content of interest of multimedia data depends on the application. First, I show that for microscopy videos, the content of interest is defined as the spatial regions in the video frame with pixels that don't only contain noise. Keeping data in those regions with high quality and throwing out other information yields to a novel microscopy video compression technique. Second, I show that for a Bluetooth low energy beacon based system, practical multimedia data storage and transmission is possible by prioritizing content of interest. I designed custom image compression techniques that preserve edges in a binary image, or foreground regions of a color image of indoor or outdoor objects. Last, I present a new indoor Bluetooth low energy beacon based augmented reality system that integrates a 3D moving object compression method that prioritizes the content of interest.Doctor of Philosoph

    Object-based video representations: shape compression and object segmentation

    Object-based video representations are considered to be useful for easing the process of multimedia content production and enhancing user interactivity in multimedia productions. Object-based video presents several new technical challenges, however. Firstly, as with conventional video representations, compression of the video data is a requirement. For object-based representations, it is necessary to compress the shape of each video object as it moves in time. This amounts to the compression of moving binary images. This is achieved by the use of a technique called context-based arithmetic encoding. The technique is utilised by applying it to rectangular pixel blocks and as such it is consistent with the standard tools of video compression. The blockbased application also facilitates well the exploitation of temporal redundancy in the sequence of binary shapes. For the first time, context-based arithmetic encoding is used in conjunction with motion compensation to provide inter-frame compression. The method, described in this thesis, has been thoroughly tested throughout the MPEG-4 core experiment process and due to favourable results, it has been adopted as part of the MPEG-4 video standard. The second challenge lies in the acquisition of the video objects. Under normal conditions, a video sequence is captured as a sequence of frames and there is no inherent information about what objects are in the sequence, not to mention information relating to the shape of each object. Some means for segmenting semantic objects from general video sequences is required. For this purpose, several image analysis tools may be of help and in particular, it is believed that video object tracking algorithms will be important. A new tracking algorithm is developed based on piecewise polynomial motion representations and statistical estimation tools, e.g. the expectationmaximisation method and the minimum description length principle

    Maximum Energy Subsampling: A General Scheme For Multi-resolution Image Representation And Analysis

    Image descriptors play an important role in image representation and analysis. Multi-resolution image descriptors can effectively characterize complex images and extract their hidden information. Wavelets descriptors have been widely used in multi-resolution image analysis. However, making the wavelets transform shift and rotation invariant produces redundancy and requires complex matching processes. As to other multi-resolution descriptors, they usually depend on other theories or information, such as filtering function, prior-domain knowledge, etc.; that not only increases the computation complexity, but also generates errors. We propose a novel multi-resolution scheme that is capable of transforming any kind of image descriptor into its multi-resolution structure with high computation accuracy and efficiency. Our multi-resolution scheme is based on sub-sampling an image into an odd-even image tree. Through applying image descriptors to the odd-even image tree, we get the relative multi-resolution image descriptors. Multi-resolution analysis is based on downsampling expansion with maximum energy extraction followed by upsampling reconstruction. Since the maximum energy usually retained in the lowest frequency coefficients; we do maximum energy extraction through keeping the lowest coefficients from each resolution level. Our multi-resolution scheme can analyze images recursively and effectively without introducing artifacts or changes to the original images, produce multi-resolution representations, obtain higher resolution images only using information from lower resolutions, compress data, filter noise, extract effective image features and be implemented in parallel processing

    Spline-based medial axis transform representation of binary images

    Medial axes are well-known descriptors used for representing, manipulating, and compressing binary images. In this paper, we present a full pipeline for computing a stable and accurate piece-wise B-spline representation of Medial Axis Transforms (MATs) of binary images. A comprehensive evaluation on a benchmark shows that our method, called Spline-based Medial Axis Transform (SMAT), achieves very high compression ratios while keeping quality high. Compared with the regular MAT representation, the SMAT yields a much higher compression ratio at the cost of a slightly lower image quality. We illustrate our approach on a multi-scale SMAT representation, generating super-resolution images, and free-form binary image deformation
