39 research outputs found

    DPCM-based edge prediction for lossless screen content coding in HEVC

    Get PDF
    Screen content sequences are ubiquitous type of video data in numerous multimedia applications like video conferencing, remote education, and cloud gaming. These sequences are characterized for depicting a mix of computer generated graphics, text, and camera-captured material. Such a mix poses several challenges, as the content usually depicts multiple strong discontinuities, which are hard to encode using current techniques. Differential pulse code modulation (DPCM)-based intra-prediction has shown to improve coding efficiency for these sequences. In this paper we propose sample-based edge and angular prediction (SEAP), a collection of DPCM-based intra-prediction modes to improve lossless coding of screen content. SEAP is aimed at accurately predicting regions depicting not only camera-captured material, but also those depicting strong edges. It incorporates modes that allow selecting the best predictor for each pixel individually based on the characteristics of the causal neighborhood of the target pixel. We incorporate SEAP into HEVC intra-prediction. Evaluation results on various screen content sequences show the advantages of SEAP over other DPCM-based approaches, with bit-rate reductions of up to 19.56% compared to standardized RDPCM. When used in conjunction with the coding tools of the screen content coding extensions, SEAP provides bit-rate reductions of up to 8.63% compared to RDPCM

    Contributions to HEVC Prediction for Medical Image Compression

    Get PDF
    Medical imaging technology and applications are continuously evolving, dealing with images of increasing spatial and temporal resolutions, which allow easier and more accurate medical diagnosis. However, this increase in resolution demands a growing amount of data to be stored and transmitted. Despite the high coding efficiency achieved by the most recent image and video coding standards in lossy compression, they are not well suited for quality-critical medical image compression where either near-lossless or lossless coding is required. In this dissertation, two different approaches to improve lossless coding of volumetric medical images, such as Magnetic Resonance and Computed Tomography, were studied and implemented using the latest standard High Efficiency Video Encoder (HEVC). In a first approach, the use of geometric transformations to perform inter-slice prediction was investigated. For the second approach, a pixel-wise prediction technique, based on Least-Squares prediction, that exploits inter-slice redundancy was proposed to extend the current HEVC lossless tools. Experimental results show a bitrate reduction between 45% and 49%, when compared with DICOM recommended encoders, and 13.7% when compared with standard HEVC

    DPCM-Based Edge Prediction for Lossless Screen Content Coding in HEVC

    Full text link

    Piecewise mapping in HEVC lossless intra-prediction coding

    Get PDF
    The lossless intra-prediction coding modality of the High Efficiency Video Coding (HEVC) standard provides high coding performance while following frame-by-frame basis access to the coded data. This is of interest in many professional applications such as medical imaging, automotive vision and digital preservation in libraries and archives. Various improvements to lossless intra-prediction coding have been proposed recently, most of them based on sample-wise prediction using Differential Pulse Code Modulation (DPCM). Other recent proposals aim at further reducing the energy of intra-predicted residual blocks. However, the energy reduction achieved is frequently minimal due to the difficulty of correctly predicting the sign and magnitude of residual values. In this paper, we pursue a novel approach to this energy-reduction problem using piecewise mapping (pwm) functions. Specifically, we analyze the range of values in residual blocks and apply accordingly a pwm function to map specific residual values to unique lower values. We encode appropriate parameters associated with the pwm functions at the encoder, so that the corresponding inverse pwm functions at the decoder can map values back to the same residual values. These residual values are then used to reconstruct the original signal. This mapping is, therefore, reversible and introduces no losses. We evaluate the pwm functions on 4×4 residual blocks computed after DPCM-based prediction for lossless coding of a variety of camera-captured and screen content sequences. Evaluation results show that the pwm functions can attain maximum bit-rate reductions of 5.54% and 28.33% for screen content material compared to DPCM-based and block-wise intra-prediction, respectively. Compared to IntraBlock Copy, piecewise mapping can attain maximum bit-rate reductions of 11.48% for camera-captured material

    Graph based transforms for block-based predictive transform coding

    Get PDF
    Orthogonal transforms are the key aspects of the encoding and decoding process in many state-of-the-art compression systems. The transforms in blockbased predictive transform coding (PTC) is essential for improving coding performance, as it allows decorrelating the signal in the form of transform coefficients. Recently, the Graph-Based Transform (GBT), has been shown to attain promising results for data decorrelation and energy compaction especially for block-based PTC. However, in order to reconstruct a frame for GBT using block-based PTC, extra-information is needed to be signalled into the bitstream, which may lead to an increased overhead. Additionally, the same graph should be available at the reconstruction stage to compute the inverse GBT of each block. In this thesis, we propose a set of a novel class of GBTs to enhance the performance of transform. These GBTs adopt several methods to address the issue of the availability of the same graph at the decoder while reconstructing video frames. Our methods to predict the graph can be categorized in two types: non-learning-based approaches and deep learning (DL) based prediction. For the first type our method uses reference samples and template-based strategies for reconstructing the same graph. For our next strategies we learn the graphs so that the information needed to compute the inverse transform is common knowledge between the compression and reconstruction processes. Finally, we train our model online to avoid the amount, quality, and relevance of the training data. Our evaluation is based on all the possible classes of HEVC videos, consist of class A to F/Screen content based on their varied resolution and characteristics. Our experimental results show that the proposed transforms outperforms the other non-trainable transforms, such as DCT and DCT/DST, which are commonly employed in current video codecs in terms of compression and reconstruction quality

    Scalable light field representation and coding

    Get PDF
    This Thesis aims to advance the state-of-the-art in light field representation and coding. In this context, proposals to improve functionalities like light field random access and scalability are also presented. As the light field representation constrains the coding approach to be used, several light field coding techniques to exploit the inherent characteristics of the most popular types of light field representations are proposed and studied, which are normally based on micro-images or sub-aperture-images. To encode micro-images, two solutions are proposed, aiming to exploit the redundancy between neighboring micro-images using a high order prediction model, where the model parameters are either explicitly transmitted or inferred at the decoder, respectively. In both cases, the proposed solutions are able to outperform low order prediction solutions. To encode sub-aperture-images, an HEVC-based solution that exploits their inherent intra and inter redundancies is proposed. In this case, the light field image is encoded as a pseudo video sequence, where the scanning order is signaled, allowing the encoder and decoder to optimize the reference picture lists to improve coding efficiency. A novel hybrid light field representation coding approach is also proposed, by exploiting the combined use of both micro-image and sub-aperture-image representation types, instead of using each representation individually. In order to aid the fast deployment of the light field technology, this Thesis also proposes scalable coding and representation approaches that enable adequate compatibility with legacy displays (e.g., 2D, stereoscopic or multiview) and with future light field displays, while maintaining high coding efficiency. Additionally, viewpoint random access, allowing to improve the light field navigation and to reduce the decoding delay, is also enabled with a flexible trade-off between coding efficiency and viewpoint random access.Esta Tese tem como objetivo avançar o estado da arte em representação e codificação de campos de luz. Neste contexto, são também apresentadas propostas para melhorar funcionalidades como o acesso aleatório ao campo de luz e a escalabilidade. Como a representação do campo de luz limita a abordagem de codificação a ser utilizada, são propostas e estudadas várias técnicas de codificação de campos de luz para explorar as características inerentes aos seus tipos mais populares de representação, que são normalmente baseadas em micro-imagens ou imagens de sub-abertura. Para codificar as micro-imagens, são propostas duas soluções, visando explorar a redundância entre micro-imagens vizinhas utilizando um modelo de predição de alta ordem, onde os parâmetros do modelo são explicitamente transmitidos ou inferidos no decodificador, respetivamente. Em ambos os casos, as soluções propostas são capazes de superar as soluções de predição de baixa ordem. Para codificar imagens de sub-abertura, é proposta uma solução baseada em HEVC que explora a inerente redundância intra e inter deste tipo de imagens. Neste caso, a imagem do campo de luz é codificada como uma pseudo-sequência de vídeo, onde a ordem de varrimento é sinalizada, permitindo ao codificador e decodificador otimizar as listas de imagens de referência para melhorar a eficiência da codificação. Também é proposta uma nova abordagem de codificação baseada na representação híbrida do campo de luz, explorando o uso combinado dos tipos de representação de micro-imagem e sub-imagem, em vez de usar cada representação individualmente. A fim de facilitar a rápida implantação da tecnologia de campo de luz, esta Tese também propõe abordagens escaláveis de codificação e representação que permitem uma compatibilidade adequada com monitores tradicionais (e.g., 2D, estereoscópicos ou multivista) e com futuros monitores de campo de luz, mantendo ao mesmo tempo uma alta eficiência de codificação. Além disso, o acesso aleatório de pontos de vista, permitindo melhorar a navegação no campo de luz e reduzir o atraso na descodificação, também é permitido com um equilíbrio flexível entre eficiência de codificação e acesso aleatório de pontos de vista

    Non-disruptive use of light fields in image and video processing

    Get PDF
    In the age of computational imaging, cameras capture not only an image but also data. This captured additional data can be best used for photo-realistic renderings facilitating numerous post-processing possibilities such as perspective shift, depth scaling, digital refocus, 3D reconstruction, and much more. In computational photography, the light field imaging technology captures the complete volumetric information of a scene. This technology has the highest potential to accelerate immersive experiences towards close-toreality. It has gained significance in both commercial and research domains. However, due to lack of coding and storage formats and also the incompatibility of the tools to process and enable the data, light fields are not exploited to its full potential. This dissertation approaches the integration of light field data to image and video processing. Towards this goal, the representation of light fields using advanced file formats designed for 2D image assemblies to facilitate asset re-usability and interoperability between applications and devices is addressed. The novel 5D light field acquisition and the on-going research on coding frameworks are presented. Multiple techniques for optimised sequencing of light field data are also proposed. As light fields contain complete 3D information of a scene, large amounts of data is captured and is highly redundant in nature. Hence, by pre-processing the data using the proposed approaches, excellent coding performance can be achieved.Im Zeitalter der computergestützten Bildgebung erfassen Kameras nicht mehr nur ein Bild, sondern vielmehr auch Daten. Diese erfassten Zusatzdaten lassen sich optimal für fotorealistische Renderings nutzen und erlauben zahlreiche Nachbearbeitungsmöglichkeiten, wie Perspektivwechsel, Tiefenskalierung, digitale Nachfokussierung, 3D-Rekonstruktion und vieles mehr. In der computergestützten Fotografie erfasst die Lichtfeld-Abbildungstechnologie die vollständige volumetrische Information einer Szene. Diese Technologie bietet dabei das größte Potenzial, immersive Erlebnisse zu mehr Realitätsnähe zu beschleunigen. Deshalb gewinnt sie sowohl im kommerziellen Sektor als auch im Forschungsbereich zunehmend an Bedeutung. Aufgrund fehlender Kompressions- und Speicherformate sowie der Inkompatibilität derWerkzeuge zur Verarbeitung und Freigabe der Daten, wird das Potenzial der Lichtfelder nicht voll ausgeschöpft. Diese Dissertation ermöglicht die Integration von Lichtfelddaten in die Bild- und Videoverarbeitung. Hierzu wird die Darstellung von Lichtfeldern mit Hilfe von fortschrittlichen für 2D-Bilder entwickelten Dateiformaten erarbeitet, um die Wiederverwendbarkeit von Assets- Dateien und die Kompatibilität zwischen Anwendungen und Geräten zu erleichtern. Die neuartige 5D-Lichtfeldaufnahme und die aktuelle Forschung an Kompressions-Rahmenbedingungen werden vorgestellt. Es werden zudem verschiedene Techniken für eine optimierte Sequenzierung von Lichtfelddaten vorgeschlagen. Da Lichtfelder die vollständige 3D-Information einer Szene beinhalten, wird eine große Menge an Daten, die in hohem Maße redundant sind, erfasst. Die hier vorgeschlagenen Ansätze zur Datenvorverarbeitung erreichen dabei eine ausgezeichnete Komprimierleistung

    Inter-prediction methods based on linear embedding for video compression

    Get PDF
    International audienceThis paper considers the problem of temporal prediction for inter-frame coding of video sequences using locally linear embedding (LLE). LLE-based prediction, first considered for intra-frame prediction, computes the predictor as a linear combination of K nearest neighbors (K-NN) searched within one or several reference frames. The paper explores different K-NN search strategies in the context of temporal prediction, leading to several temporal predictor variants. The proposed methods are tested as extra inter-frame prediction modes in an H.264 codec, but the proposed concepts are still valid in HEVC. The results show that significant rate-distortion performance gains are obtained with respect to H.264 (up to 15.31% bit-rate saving)

    Transformées basées graphes pour la compression de nouvelles modalités d’image

    Get PDF
    Due to the large availability of new camera types capturing extra geometrical information, as well as the emergence of new image modalities such as light fields and omni-directional images, a huge amount of high dimensional data has to be stored and delivered. The ever growing streaming and storage requirements of these new image modalities require novel image coding tools that exploit the complex structure of those data. This thesis aims at exploring novel graph based approaches for adapting traditional image transform coding techniques to the emerging data types where the sampled information are lying on irregular structures. In a first contribution, novel local graph based transforms are designed for light field compact representations. By leveraging a careful design of local transform supports and a local basis functions optimization procedure, significant improvements in terms of energy compaction can be obtained. Nevertheless, the locality of the supports did not permit to exploit long term dependencies of the signal. This led to a second contribution where different sampling strategies are investigated. Coupled with novel prediction methods, they led to very prominent results for quasi-lossless compression of light fields. The third part of the thesis focuses on the definition of rate-distortion optimized sub-graphs for the coding of omni-directional content. If we move further and give more degree of freedom to the graphs we wish to use, we can learn or define a model (set of weights on the edges) that might not be entirely reliable for transform design. The last part of the thesis is dedicated to theoretically analyze the effect of the uncertainty on the efficiency of the graph transforms.En raison de la grande disponibilité de nouveaux types de caméras capturant des informations géométriques supplémentaires, ainsi que de l'émergence de nouvelles modalités d'image telles que les champs de lumière et les images omnidirectionnelles, il est nécessaire de stocker et de diffuser une quantité énorme de hautes dimensions. Les exigences croissantes en matière de streaming et de stockage de ces nouvelles modalités d’image nécessitent de nouveaux outils de codage d’images exploitant la structure complexe de ces données. Cette thèse a pour but d'explorer de nouvelles approches basées sur les graphes pour adapter les techniques de codage de transformées d'image aux types de données émergents où les informations échantillonnées reposent sur des structures irrégulières. Dans une première contribution, de nouvelles transformées basées sur des graphes locaux sont conçues pour des représentations compactes des champs de lumière. En tirant parti d’une conception minutieuse des supports de transformées locaux et d’une procédure d’optimisation locale des fonctions de base , il est possible d’améliorer considérablement le compaction d'énergie. Néanmoins, la localisation des supports ne permettait pas d'exploiter les dépendances à long terme du signal. Cela a conduit à une deuxième contribution où différentes stratégies d'échantillonnage sont étudiées. Couplés à de nouvelles méthodes de prédiction, ils ont conduit à des résultats très importants en ce qui concerne la compression quasi sans perte de champs de lumière statiques. La troisième partie de la thèse porte sur la définition de sous-graphes optimisés en distorsion de débit pour le codage de contenu omnidirectionnel. Si nous allons plus loin et donnons plus de liberté aux graphes que nous souhaitons utiliser, nous pouvons apprendre ou définir un modèle (ensemble de poids sur les arêtes) qui pourrait ne pas être entièrement fiable pour la conception de transformées. La dernière partie de la thèse est consacrée à l'analyse théorique de l'effet de l'incertitude sur l'efficacité des transformées basées graphes
    corecore