345 research outputs found

    Graph Spectral Image Processing

    Full text link
    Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this article, we overview recent graph spectral techniques in GSP specifically for image / video processing. The topics covered include image compression, image restoration, image filtering and image segmentation

    Error resilience and concealment techniques for high-efficiency video coding

    Get PDF
    This thesis investigates the problem of robust coding and error concealment in High Efficiency Video Coding (HEVC). After a review of the current state of the art, a simulation study about error robustness, revealed that the HEVC has weak protection against network losses with significant impact on video quality degradation. Based on this evidence, the first contribution of this work is a new method to reduce the temporal dependencies between motion vectors, by improving the decoded video quality without compromising the compression efficiency. The second contribution of this thesis is a two-stage approach for reducing the mismatch of temporal predictions in case of video streams received with errors or lost data. At the encoding stage, the reference pictures are dynamically distributed based on a constrained Lagrangian rate-distortion optimization to reduce the number of predictions from a single reference. At the streaming stage, a prioritization algorithm, based on spatial dependencies, selects a reduced set of motion vectors to be transmitted, as side information, to reduce mismatched motion predictions at the decoder. The problem of error concealment-aware video coding is also investigated to enhance the overall error robustness. A new approach based on scalable coding and optimally error concealment selection is proposed, where the optimal error concealment modes are found by simulating transmission losses, followed by a saliency-weighted optimisation. Moreover, recovery residual information is encoded using a rate-controlled enhancement layer. Both are transmitted to the decoder to be used in case of data loss. Finally, an adaptive error resilience scheme is proposed to dynamically predict the video stream that achieves the highest decoded quality for a particular loss case. A neural network selects among the various video streams, encoded with different levels of compression efficiency and error protection, based on information from the video signal, the coded stream and the transmission network. Overall, the new robust video coding methods investigated in this thesis yield consistent quality gains in comparison with other existing methods and also the ones implemented in the HEVC reference software. Furthermore, the trade-off between coding efficiency and error robustness is also better in the proposed methods

    A COMPUTATION METHOD/FRAMEWORK FOR HIGH LEVEL VIDEO CONTENT ANALYSIS AND SEGMENTATION USING AFFECTIVE LEVEL INFORMATION

    No full text
    VIDEO segmentation facilitates e±cient video indexing and navigation in large digital video archives. It is an important process in a content-based video indexing and retrieval (CBVIR) system. Many automated solutions performed seg- mentation by utilizing information about the \facts" of the video. These \facts" come in the form of labels that describe the objects which are captured by the cam- era. This type of solutions was able to achieve good and consistent results for some video genres such as news programs and informational presentations. The content format of this type of videos is generally quite standard, and automated solutions were designed to follow these format rules. For example in [1], the presence of news anchor persons was used as a cue to determine the start and end of a meaningful news segment. The same cannot be said for video genres such as movies and feature films. This is because makers of this type of videos utilized different filming techniques to design their videos in order to elicit certain affective response from their targeted audience. Humans usually perform manual video segmentation by trying to relate changes in time and locale to discontinuities in meaning [2]. As a result, viewers usually have doubts about the boundary locations of a meaningful video segment due to their different affective responses. This thesis presents an entirely new view to the problem of high level video segmentation. We developed a novel probabilistic method for affective level video content analysis and segmentation. Our method had two stages. In the first stage, a®ective content labels were assigned to video shots by means of a dynamic bayesian 0. Abstract 3 network (DBN). A novel hierarchical-coupled dynamic bayesian network (HCDBN) topology was proposed for this stage. The topology was based on the pleasure- arousal-dominance (P-A-D) model of a®ect representation [3]. In principle, this model can represent a large number of emotions. In the second stage, the visual, audio and a®ective information of the video was used to compute a statistical feature vector to represent the content of each shot. Affective level video segmentation was achieved by applying spectral clustering to the feature vectors. We evaluated the first stage of our proposal by comparing its emotion detec- tion ability with all the existing works which are related to the field of a®ective video content analysis. To evaluate the second stage, we used the time adaptive clustering (TAC) algorithm as our performance benchmark. The TAC algorithm was the best high level video segmentation method [2]. However, it is a very computationally intensive algorithm. To accelerate its computation speed, we developed a modified TAC (modTAC) algorithm which was designed to be mapped easily onto a field programmable gate array (FPGA) device. Both the TAC and modTAC algorithms were used as performance benchmarks for our proposed method. Since affective video content is a perceptual concept, the segmentation per- formance and human agreement rates were used as our evaluation criteria. To obtain our ground truth data and viewer agreement rates, a pilot panel study which was based on the work of Gross et al. [4] was conducted. Experiment results will show the feasibility of our proposed method. For the first stage of our proposal, our experiment results will show that an average improvement of as high as 38% was achieved over previous works. As for the second stage, an improvement of as high as 37% was achieved over the TAC algorithm

    Approximate nearest neighbors for recognition of foreground and background in images and video

    Get PDF
    Problems in image matching, saliency detection in images, and background detection in video are studied. Algorithms based on approximate nearest-neighbor matching are proposed to solve problems in these related domains. Image patches are quantized into features using a special Walsh-Hadamard transform, and put into a propagation-assisted kd-tree for indexing and search. Image saliency and background-detection algorithms are then derived by looking at patch similarity over time and space

    Learned-based Intra Coding Tools for Video Compression.

    Get PDF
    PhD Theses.The increase in demand for video rendering in 4K and beyond displays, as well as immersive video formats, requires the use of e cient compression techniques. In this thesis novel methods for enhancing the e ciency of current and next generation video codecs are investigated. Several aspects that in uence the way conventional video coding methods work are considered. The methods proposed in this thesis utilise Neural Networks (NNs) trained for regression tasks in order to predict data. In particular, Convolutional Neural Networks (CNNs) are used to predict Rate-Distortion (RD) data for intra-coded frames. Moreover, a novel intra-prediction methods are proposed with the aim of providing new ways to exploit redundancies overlooked by traditional intraprediction tools. Additionally, it is shown how such methods can be simpli ed in order to derive less resource-demanding tools

    Transformées basées graphes pour la compression de nouvelles modalités d’image

    Get PDF
    Due to the large availability of new camera types capturing extra geometrical information, as well as the emergence of new image modalities such as light fields and omni-directional images, a huge amount of high dimensional data has to be stored and delivered. The ever growing streaming and storage requirements of these new image modalities require novel image coding tools that exploit the complex structure of those data. This thesis aims at exploring novel graph based approaches for adapting traditional image transform coding techniques to the emerging data types where the sampled information are lying on irregular structures. In a first contribution, novel local graph based transforms are designed for light field compact representations. By leveraging a careful design of local transform supports and a local basis functions optimization procedure, significant improvements in terms of energy compaction can be obtained. Nevertheless, the locality of the supports did not permit to exploit long term dependencies of the signal. This led to a second contribution where different sampling strategies are investigated. Coupled with novel prediction methods, they led to very prominent results for quasi-lossless compression of light fields. The third part of the thesis focuses on the definition of rate-distortion optimized sub-graphs for the coding of omni-directional content. If we move further and give more degree of freedom to the graphs we wish to use, we can learn or define a model (set of weights on the edges) that might not be entirely reliable for transform design. The last part of the thesis is dedicated to theoretically analyze the effect of the uncertainty on the efficiency of the graph transforms.En raison de la grande disponibilité de nouveaux types de caméras capturant des informations géométriques supplémentaires, ainsi que de l'émergence de nouvelles modalités d'image telles que les champs de lumière et les images omnidirectionnelles, il est nécessaire de stocker et de diffuser une quantité énorme de hautes dimensions. Les exigences croissantes en matière de streaming et de stockage de ces nouvelles modalités d’image nécessitent de nouveaux outils de codage d’images exploitant la structure complexe de ces données. Cette thèse a pour but d'explorer de nouvelles approches basées sur les graphes pour adapter les techniques de codage de transformées d'image aux types de données émergents où les informations échantillonnées reposent sur des structures irrégulières. Dans une première contribution, de nouvelles transformées basées sur des graphes locaux sont conçues pour des représentations compactes des champs de lumière. En tirant parti d’une conception minutieuse des supports de transformées locaux et d’une procédure d’optimisation locale des fonctions de base , il est possible d’améliorer considérablement le compaction d'énergie. Néanmoins, la localisation des supports ne permettait pas d'exploiter les dépendances à long terme du signal. Cela a conduit à une deuxième contribution où différentes stratégies d'échantillonnage sont étudiées. Couplés à de nouvelles méthodes de prédiction, ils ont conduit à des résultats très importants en ce qui concerne la compression quasi sans perte de champs de lumière statiques. La troisième partie de la thèse porte sur la définition de sous-graphes optimisés en distorsion de débit pour le codage de contenu omnidirectionnel. Si nous allons plus loin et donnons plus de liberté aux graphes que nous souhaitons utiliser, nous pouvons apprendre ou définir un modèle (ensemble de poids sur les arêtes) qui pourrait ne pas être entièrement fiable pour la conception de transformées. La dernière partie de la thèse est consacrée à l'analyse théorique de l'effet de l'incertitude sur l'efficacité des transformées basées graphes
    • …
    corecore