154 research outputs found

    Hybrid Video Coding based on Bidimensional Matching Pursuit

    Get PDF
    Hybrid video coding combines together two stages: first, motion estimation and compensation predict each frame from the neighboring frames, then the prediction error is coded, reducing the correlation in the spatial domain. In this work, we focus on the latter stage, presenting a scheme that profits from some of the features introduced by the standard H.264/AVC for motion estimation and replaces the transform in the spatial domain. The prediction error is so coded using the matching pursuit algorithm which decomposes the signal over an appositely designed bidimensional, anisotropic, redundant dictionary. Comparisons are made among the proposed technique, H.264, and a DCT-based coding scheme. Moreover, we introduce fast techniques for atom selection, which exploit the spatial localization of the atoms. An adaptive coding scheme aimed at optimizing the resource allocation is also presented, together with a rate-distortion study for the matching pursuit algorithm. Results show that the proposed scheme outperforms the standard DCT, especially at very low bit rates

    Nonlinear approximation with redundant multi-component dictionaries

    Get PDF
    The problem of efficiently representing and approximating digital data is an open challenge and it is of paramount importance for many applications. This dissertation focuses on the approximation of natural signals as an organized combination of mutually connected elements, preserving and at the same time benefiting from their inherent structure. This is done by decomposing a signal onto a multi-component, redundant collection of functions (dictionary), built by the union of several subdictionaries, each of which is designed to capture a specific behavior of the signal. In this way, instead of representing signals as a superposition of sinusoids or wavelets many alternatives are available. In addition, since dictionaries we are interested in are overcomplete, the decomposition is non-unique. This gives us the possibility of adaptation, choosing among many possible representations the one which best fits our purposes. On the other hand, it also requires more complex approximation techniques whose theoretical decomposition capacity and computational load have to be carefully studied. In general, we aim at representing a signal with few and meaningful components. If we are able to represent a piece of information by using only few elements, it means that such elements can capture its main characteristics, allowing to compact the energy carried by a signal into the smallest number of terms. In such a framework, this work also proposes analysis methods which deal with the goal of considering the a priori information available when decomposing a structured signal. Indeed, a natural signal is not only an array of numbers, but an expression of a physical event about which we usually have a deep knowledge. Therefore, we claim that it is worth exploiting its structure, since it can be advantageous not only in helping the analysis process, but also in making the representation of such information more accessible and meaningful. The study of an adaptive image representation inspired and gave birth to this work. We often refer to images and visual information throughout the course of the dissertation. However, the proposed approximation setting extends to many different kinds of structured data and examples are given involving videos and electrocardiogram signals. An important part of this work is constituted by practical applications: first of all we provide very interesting results for image and video compression. Then, we also face the problem of signal denoising and, finally, promising achievements in the field of source separation are presented

    Ridgelet transform applied to motion compensated images

    Get PDF
    Wavelet transform is a powerful instrument in catching zero-dimensional singularities. Ridgelets are a powerful instrument in catching and representing mono-dimensional singularities in bidimensional space. In this paper we propose a hybrid video coder scheme using ridgelet transform for the first approximation of line-edge singularities in displaced frame difference images. We demonstrate the potential of ridgelets and results show substantial improvements when compared to wavelet only based coder

    Video coding based on fractals and sparse representations

    Get PDF
    Orientador: Hélio PedriniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Vídeos são sequências de imagens estáticas representando cenas em movimento. Transmitir e armazenar essas imagens sem nenhum tipo de pré-processamento necessitaria de enormes larguras de banda nos canais de comunicação e uma quantidade massiva de espaço de armazenamento. A fim de reduzir o número de bits necessários para tais dados, foram criados métodos de compressão com perda. Esses métodos geralmente consistem em um codificador e um decodificador, tal que o codificador gera uma sequência de bits que representa uma aproximação razoável do vídeo através de um formato pré-especificado e o decodificador lê essa sequência, convertendo-a novamente em uma série de imagens. A transmissão de vídeos sob restrições extremas de largura de banda tem aplicações importantes como videoconferências e circuitos fechados de televisão. Neste trabalho são abordados dois métodos destinados a essa aplicação, decomposição usando representações esparsas e compressão fractal. A ampla maioria dos codificadores tem como mecanismo principal o uso de transformações inversíveis capazes de representar imagens espacialmente suaves com poucos coeficientes não-nulos. Representações esparsas são uma generalização dessa ideia, em que a transformação tem como base um conjunto cujo número de elementos excede a dimensão do espaço vetorial onde ela opera. A projeção dos dados pode ser feita a partir de uma heurística rápida chamada Matching Pursuit. Uma abordagem combinando essa heurística com um algoritmo para gerar a base sobrecompleta por aprendizado de máquina é apresentada. Codificadores fractais representam uma aproximação da imagem como um sistema de funções iterativas. Para isso, criam e transmitem uma sequência de comandos, chamada colagem, capazes de obter uma representação da imagem na escala original dada a mesma imagem em uma escala reduzida. A colagem é criada de tal forma que, se aplicada a uma imagem inicial qualquer repetidas vezes, reduzindo sua escala antes de toda iteração, converge em uma aproximação da imagem codificada. Métodos simplificados e rápidos para a criação da colagem e uma generalização desses métodos para a compressão de vídeos são apresentados. Ao invés de construir a colagem tentando mapear qualquer bloco da escala reduzida na escala original, apenas um conjunto pequeno de blocos é considerado. O método de compressão proposto para vídeos agrupa um conjunto de quadros consecutivos do vídeo em um fractal volumétrico. A colagem mapeia blocos tridimensionais entre as escalas, considerando uma escala menor tanto no tempo quanto no espaço. Uma adaptação desse método para canais de comunicação cuja largura de banda é instável também é propostaAbstract: A video is a sequence of still images representing scenes in motion. A video is a sequence of extremely similar images separated by abrupt changes in their content. If these images were transmitted and stored without any kind of preprocessing, this would require a massive amount of storage space and communication channels with very high bandwidths. Lossy compression methods were created in order to reduce the number of bits used to represent this kind of data. These methods generally consist in an encoder and a decoder, where the encoder generates a sequence of bits that represents an acceptable approximation of the video using a certain predefined format and the decoder reads this sequence, converting it back into a series of images. Transmitting videos under extremely limited bandwidth has important applications in video conferences or closed-circuit television systems. Two different approaches are explored in this work, decomposition based on sparse representations and fractal coding. Most video coders are based on invertible transforms capable of representing spatially smooth images with few non-zero coeficients. Sparse representations are a generalization of this idea using a transform that has an overcomplete dictionary as a basis. Overcomplete dictionaries are sets with more elements in it than the dimension of the vector space in which the transform operates. The data can be projected into this basis using a fast heuristic called Matching Pursuits. A video encoder combining this fast heuristic with a machine learning algorithm capable of constructing the overcomplete dictionary is proposed. Fractal encoders represent an approximation of the image through an iterated function system. In order to do that, a sequence of instructions, called a collage, is created and transmitted. The collage can construct an approximation of the original image given a smaller scale version of it. It is created in such a way that, when applied to any initial image several times, contracting it before each iteration, it converges into an approximation of the encoded image. Simplier and faster methods for creating a collage and a generalization of these methods to video compression are presented. Instead of constructing a collage by matching any block from the smaller scale to the original one, a small subset of possible matches is considered. The proposed video encoding method creates groups of consecutive frames which are used to construct a volumetric fractal. The collage maps tridimensional blocks between the different scales, using a smaller scale in both space and time. An improved version of this algorithm designed for communication channels with variable bandwidth is presentedMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    Vector Disparity Sensor with Vergence Control for Active Vision Systems

    Get PDF
    This paper presents an architecture for computing vector disparity for active vision systems as used on robotics applications. The control of the vergence angle of a binocular system allows us to efficiently explore dynamic environments, but requires a generalization of the disparity computation with respect to a static camera setup, where the disparity is strictly 1-D after the image rectification. The interaction between vision and motor control allows us to develop an active sensor that achieves high accuracy of the disparity computation around the fixation point, and fast reaction time for the vergence control. In this contribution, we address the development of a real-time architecture for vector disparity computation using an FPGA device. We implement the disparity unit and the control module for vergence, version, and tilt to determine the fixation point. In addition, two on-chip different alternatives for the vector disparity engines are discussed based on the luminance (gradient-based) and phase information of the binocular images. The multiscale versions of these engines are able to estimate the vector disparity up to 32 fps on VGA resolution images with very good accuracy as shown using benchmark sequences with known ground-truth. The performances in terms of frame-rate, resource utilization, and accuracy of the presented approaches are discussed. On the basis of these results, our study indicates that the gradient-based approach leads to the best trade-off choice for the integration with the active vision system

    A Panorama on Multiscale Geometric Representations, Intertwining Spatial, Directional and Frequency Selectivity

    Full text link
    The richness of natural images makes the quest for optimal representations in image processing and computer vision challenging. The latter observation has not prevented the design of image representations, which trade off between efficiency and complexity, while achieving accurate rendering of smooth regions as well as reproducing faithful contours and textures. The most recent ones, proposed in the past decade, share an hybrid heritage highlighting the multiscale and oriented nature of edges and patterns in images. This paper presents a panorama of the aforementioned literature on decompositions in multiscale, multi-orientation bases or dictionaries. They typically exhibit redundancy to improve sparsity in the transformed domain and sometimes its invariance with respect to simple geometric deformations (translation, rotation). Oriented multiscale dictionaries extend traditional wavelet processing and may offer rotation invariance. Highly redundant dictionaries require specific algorithms to simplify the search for an efficient (sparse) representation. We also discuss the extension of multiscale geometric decompositions to non-Euclidean domains such as the sphere or arbitrary meshed surfaces. The etymology of panorama suggests an overview, based on a choice of partially overlapping "pictures". We hope that this paper will contribute to the appreciation and apprehension of a stream of current research directions in image understanding.Comment: 65 pages, 33 figures, 303 reference

    BEMDEC: An Adaptive and Robust Methodology for Digital Image Feature Extraction

    Get PDF
    The intriguing study of feature extraction, and edge detection in particular, has, as a result of the increased use of imagery, drawn even more attention not just from the field of computer science but also from a variety of scientific fields. However, various challenges surrounding the formulation of feature extraction operator, particularly of edges, which is capable of satisfying the necessary properties of low probability of error (i.e., failure of marking true edges), accuracy, and consistent response to a single edge, continue to persist. Moreover, it should be pointed out that most of the work in the area of feature extraction has been focused on improving many of the existing approaches rather than devising or adopting new ones. In the image processing subfield, where the needs constantly change, we must equally change the way we think. In this digital world where the use of images, for variety of purposes, continues to increase, researchers, if they are serious about addressing the aforementioned limitations, must be able to think outside the box and step away from the usual in order to overcome these challenges. In this dissertation, we propose an adaptive and robust, yet simple, digital image features detection methodology using bidimensional empirical mode decomposition (BEMD), a sifting process that decomposes a signal into its two-dimensional (2D) bidimensional intrinsic mode functions (BIMFs). The method is further extended to detect corners and curves, and as such, dubbed as BEMDEC, indicating its ability to detect edges, corners and curves. In addition to the application of BEMD, a unique combination of a flexible envelope estimation algorithm, stopping criteria and boundary adjustment made the realization of this multi-feature detector possible. Further application of two morphological operators of binarization and thinning adds to the quality of the operator

    Matching pursuit-based shape representation and recognition using scale-space

    Get PDF
    In this paper, we propose an analytical low-level representation of images, obtained by a decomposition process, namely the matching pursuit (MP) algorithm, as a new way of describing objects through a general continuous description using an affine invariant dictionary of basis function (BFs). This description is used to recognize multiple objects in images. In the learning phase, a template object is decomposed, and the extracted subset of BFs, called meta-atom, gives the description of the object. This description is then naturally extended into the linear scale-space using the definition of our BFs, and thus providing a more general representation of the object. We use this enhanced description as a predefined dictionary of the object to conduct an MP-based shape recognition task into the linear scale-space. The introduction of the scale-space approach improves the robustness of our method: we avoid local minima issues encountered when minimizing a nonconvex energy function. We show results for the detection of complex synthetic shapes, as well as real world (aerial and medical) images. © 2007 Wiley Periodicals, Inc. Int J Imaging Syst Technol, 16, 162-180, 200
    corecore