27 research outputs found

    Low Bit-rate Color Video Compression using Multiwavelets in Three Dimensions

    Get PDF
    In recent years, wavelet-based video compressions have become a major focus of research because of the advantages that it provides. More recently, a growing thrust of studies explored the use of multiple scaling functions and multiple wavelets with desirable properties in various fields, from image de-noising to compression. In term of data compression, multiple scaling functions and wavelets offer a greater flexibility in coefficient quantization at high compression ratio than a comparable single wavelet. The purpose of this research is to investigate the possible improvement of scalable wavelet-based color video compression at low bit-rates by using three-dimensional multiwavelets. The first part of this work included the development of the spatio-temporal decomposition process for multiwavelets and the implementation of an efficient 3-D SPIHT encoder/decoder as a common platform for performance evaluation of two well-known multiwavelet systems against a comparable single wavelet in low bitrate color video compression. The second part involved the development of a motion-compensated 3-D compression codec and a modified SPIHT algorithm designed specifically for this codec by incorporating an advantage in the design of 2D SPIHT into the 3D SPIHT coder. In an experiment that compared their performances, the 3D motion-compensated codec with unmodified 3D SPIHT had gains of 0.3dB to 4.88dB over regular 2D wavelet-based motion-compensated codec using 2D SPIHT in the coding of 19 endoscopy sequences at 1/40 compression ratio. The effectiveness of the modified SPIHT algorithm was verified by the results of a second experiment in which it was used to re-encode 4 of the 19 sequences with lowest performance gains and improved them by 0.5dB to 1.0dB. The last part of the investigation examined the effect of multiwavelet packet on 3-D video compression as well as the effects of coding multiwavelet packets based on the frequency order and energy content of individual subbands

    Peak Transform for Efficient Image Representation and Coding

    Get PDF
    Digital Object Identifier 10.1109/TIP.2007.896599In this work, we introduce a nonlinear geometric transform, called peak transform (PT), for efficient image representation and coding. The proposed PT is able to convert high-frequency signals into low-frequency ones, making them much easier to be compressed. Coupled with wavelet transform and subband decomposition, the PT is able to significantly reduce signal energy in high-frequency subbands and achieve a significant transform coding gain. This has important applications in efficient data representation and compression. To maximize the transform coding gain, we develop a dynamic programming solution for optimum PT design. Based on PT, we design an image encoder, called the PT encoder, for efficient image compression. Our extensive experimental results demonstrate that, in wavelet-based subband decomposition, the signal energy in high-frequency subbands can be reduced by up to 60% if a PT is applied. The PT image encoder outperforms state-of-the-art JPEG2000 and H.264 (INTRA) encoders by up to 2-3 dB in peak signal-to-noise ratio (PSNR), especially for images with a significant amount of high-frequency components. Our experimental results also show that the proposed PT is able to efficiently capture and preserve high-frequency image features (e.g., edges) and yields significantly improved visual quality

    A State Table SPHIT Approach for Modified Curvelet-based Medical Image Compression

    Get PDF
    Medical imaging plays a significant role in clinical practice. Storing and transferring a large volume of images can be complex and inefficient. This paper presents the development of a new compression technique that combines the fast discrete curvelet transform (FDCvT) with state table set partitioning in the hierarchical trees (STS) encoding scheme. The curvelet transform is an extension of the wavelet transform algorithm that represents data based on scale and position. Initially, the medical image was decomposed using the FDCvT algorithm. The FDCvT algorithm creates symmetrical values for the detail coefficients, and these coefficients are modified to improve the efficiency of the algorithm. The curvelet coefficients are then encoded using the STS and differential pulse-code modulation (DPCM). The greatest amount of energy is contained in the coarse coefficients, which are encoded using the DPCM method. The finest and modified detail coefficients are encoded using the STS method. A variety of medical modalities, including computed tomography (CT), positron emission tomography (PET), and magnetic resonance imaging (MRI), are used to verify the performance of the proposed technique. Various quality metrics, including peak signal-to-noise ratio (PSNR), compression ratio (CR), and structural similarity index (SSIM), are used to evaluate the compression results. Additionally, the computation time for the encoding (ET) and decoding (DT) processes is measured. The experimental results showed that the PET image obtained higher values of the PSNR and CR. The CT image provides high quality for the reconstructed image, with an SSIM value of 0.96 and the fastest ET of 0.13 seconds. The MRI image has the shortest DT, which is 0.23 seconds

    Implementation of MPEG-4s Subdivision Surfaces Tools

    Get PDF
    This work is about the implementation of a MPEG-4 decoder for subdivision surfaces, which are powerful 3D paradigms allowing to compactly represent piecewise smooth surfaces. This study will take place in the framework of MPEG-4 AFX, the extension of the MPEG-4 standard including the subdivision surfaces. This document will introduce, with some details, the theory of subdivision surfaces in the two forms present in MPEG-4: plain and detailed/ wavelet subdivision surfaces. It will particularly concentrate on wavelet subdivision surfaces, which permit progressive 3D mesh compression

    Modelado jerárquico de objetos 3D con superficies de subdivisión

    Get PDF
    Las SSs (Superficies de Subdivisión) son un potente paradigma de modelado de objetos 3D (tridimensionales) que establece un puente entre los dos enfoques tradicionales a la aproximación de superficies, basados en mallas poligonales y de parches alabeados, que conllevan problemas uno y otro. Los esquemas de subdivisión permiten definir una superficie suave (a tramos), como las más frecuentes en la práctica, como el límite de un proceso recursivo de refinamiento de una malla de control burda, que puede ser descrita muy compactamente. Además, la recursividad inherente a las SSs establece naturalmente una relación de anidamiento piramidal entre las mallas / NDs (Niveles de Detalle) generadas/os sucesivamente, por lo que las SSs se prestan extraordinariamente al AMRO (Análisis Multiresolución mediante Ondículas) de superficies, que tiene aplicaciones prácticas inmediatas e interesantísimas, como la codificación y la edición jerárquicas de modelos 3D. Empezamos describiendo los vínculos entre las tres áreas que han servido de base a nuestro trabajo (SSs, extracción automática de NDs y AMRO) para explicar como encajan estas tres piezas del puzzle del modelado jerárquico de objetos de 3D con SSs. El AMRO consiste en descomponer una función en una versión burda suya y un conjunto de refinamientos aditivos anidados jerárquicamente llamados "coeficientes ondiculares". La teoría clásica de ondículas estudia las señales clásicas nD: las definidas sobre dominios paramétricos homeomorfos a R" o (0,1)n como el audio (n=1), las imágenes (n=2) o el vídeo (n=3). En topologías menos triviales, como las variedades 2D) (superficies en el espacio 3D), el AMRO no es tan obvio, pero sigue siendo posible si se enfoca desde la perspectiva de las SSs. Basta con partir de una malla burda que aproxime a un bajo ND la superficie considerada, subdividirla recursivamente y, al hacerlo, ir añadiendo los coeficientes ondiculares, que son los detalles 3D necesarios para obtener aproximaciones más y más finas a la superficie original. Pasamos después a las aplicaciones prácticas que constituyen nuestros principal desarrollo original y, en particular, presentamos una técnica de codificación jerárquica de modelos 3D basada en SSs, que actúa sobre los detalles 3D mencionados: los expresa en un referencial normal loscal; los organiza según una estructura jerárquica basada en facetas; los cuantifica dedicando menos bits a sus componentes tangenciales, menos energéticas, y los "escalariza"; y los codifica dinalmente gracias a una técnica similar al SPIHT (Set Partitioning In Hierarchical Tress) de Said y Pearlman. El resultado es un código completamente embebido y al menos dos veces más compacto, para superficies mayormente suaves, que los obtenidos con técnicas de codificación progresiva de mallas 3D publicadas previamente, en las que además los NDs no están anidados piramidalmente. Finalmente, describimos varios métodos auxiliares que hemos desarrollado, mejorando técnicas previas y creando otras propias, ya que una solución completa al modelado de objetos 3D con SSs requiere resolver otros dos problemas. El primero es la extracción de una malla base (triangular, en nuestro caso) de la superficie original, habitualmente dada por una malla triangular fina con conectividad arbitraria. El segundo es la generación de un remallado recursivo con conectividad de subdivisión de la malla original/objetivo mediante un refinamiento recursivo de la malla base, calculando así los detalles 3D necesarios para corregir las posiciones predichas por la subdivisión para nuevos vértices

    Dimensionality reduction and sparse representations in computer vision

    Get PDF
    The proliferation of camera equipped devices, such as netbooks, smartphones and game stations, has led to a significant increase in the production of visual content. This visual information could be used for understanding the environment and offering a natural interface between the users and their surroundings. However, the massive amounts of data and the high computational cost associated with them, encumbers the transfer of sophisticated vision algorithms to real life systems, especially ones that exhibit resource limitations such as restrictions in available memory, processing power and bandwidth. One approach for tackling these issues is to generate compact and descriptive representations of image data by exploiting inherent redundancies. We propose the investigation of dimensionality reduction and sparse representations in order to accomplish this task. In dimensionality reduction, the aim is to reduce the dimensions of the space where image data reside in order to allow resource constrained systems to handle them and, ideally, provide a more insightful description. This goal is achieved by exploiting the inherent redundancies that many classes of images, such as faces under different illumination conditions and objects from different viewpoints, exhibit. We explore the description of natural images by low dimensional non-linear models called image manifolds and investigate the performance of computer vision tasks such as recognition and classification using these low dimensional models. In addition to dimensionality reduction, we study a novel approach in representing images as a sparse linear combination of dictionary examples. We investigate how sparse image representations can be used for a variety of tasks including low level image modeling and higher level semantic information extraction. Using tools from dimensionality reduction and sparse representation, we propose the application of these methods in three hierarchical image layers, namely low-level features, mid-level structures and high-level attributes. Low level features are image descriptors that can be extracted directly from the raw image pixels and include pixel intensities, histograms, and gradients. In the first part of this work, we explore how various techniques in dimensionality reduction, ranging from traditional image compression to the recently proposed Random Projections method, affect the performance of computer vision algorithms such as face detection and face recognition. In addition, we discuss a method that is able to increase the spatial resolution of a single image, without using any training examples, according to the sparse representations framework. In the second part, we explore mid-level structures, including image manifolds and sparse models, produced by abstracting information from low-level features and offer compact modeling of high dimensional data. We propose novel techniques for generating more descriptive image representations and investigate their application in face recognition and object tracking. In the third part of this work, we propose the investigation of a novel framework for representing the semantic contents of images. This framework employs high level semantic attributes that aim to bridge the gap between the visual information of an image and its textual description by utilizing low level features and mid level structures. This innovative paradigm offers revolutionary possibilities including recognizing the category of an object from purely textual information without providing any explicit visual example

    On Development of Some Soft Computing Based Multiuser Detection Techniques for SDMA–OFDM Wireless Communication System

    Get PDF
    Space Division Multiple Access(SDMA) based technique as a subclass of Multiple Input Multiple Output (MIMO) systems achieves high spectral efficiency through bandwidth reuse by multiple users. On the other hand, Orthogonal Frequency Division Multiplexing (OFDM) mitigates the impairments of the propagation channel. The combination of SDMA and OFDM has emerged as a most competitive technology for future wireless communication system. In the SDMA uplink, multiple users communicate simultaneously with a multiple antenna Base Station (BS) sharing the same frequency band by exploring their unique user specific-special spatial signature. Different Multiuser Detection (MUD) schemes have been proposed at the BS receiver to identify users correctly by mitigating the multiuser interference. However, most of the classical MUDs fail to separate the users signals in the over load scenario, where the number of users exceed the number of receiving antennas. On the other hand, due to exhaustive search mechanism, the optimal Maximum Likelihood (ML) detector is limited by high computational complexity, which increases exponentially with increasing number of simultaneous users. Hence, cost function minimization based Minimum Error Rate (MER) detectors are preferred, which basically minimize the probability of error by iteratively updating receiver’s weights using adaptive algorithms such as Steepest Descent (SD), Conjugate Gradient (CG) etc. The first part of research proposes Optimization Techniques (OTs) aided MER detectors to overcome the shortfalls of the CG based MER detectors. Popular metaheuristic search algorithms like Adaptive Genetic Algorithm (AGA), Adaptive Differential Evolution Algorithm (ADEA) and Invasive Weed Optimization (IWO), which rely on an intelligent search of a large but finite solution space using statistical methods, have been applied for finding the optimal weight vectors for MER MUD. Further, it is observed in an overload SDMA–OFDM system that the channel output phasor constellation often becomes linearly non-separable. With increasing the number of users, the receiver weight optimization task turns out to be more difficult due to the exponentially increased number of dimensions of the weight matrix. As a result, MUD becomes a challenging multidimensional optimization problem. Therefore, signal classification requires a nonlinear solution. Considering this, the second part of research work suggests Artificial Neural Network (ANN) based MUDs on thestandard Multilayer Perceptron (MLP) and Radial Basis Function (RBF) frameworks fo

    GPU data structures for graphics and vision

    Get PDF
    Graphics hardware has in recent years become increasingly programmable, and its programming APIs use the stream processor model to expose massive parallelization to the programmer. Unfortunately, the inherent restrictions of the stream processor model, used by the GPU in order to maintain high performance, often pose a problem in porting CPU algorithms for both video and volume processing to graphics hardware. Serial data dependencies which accelerate CPU processing are counterproductive for the data-parallel GPU. This thesis demonstrates new ways for tackling well-known problems of large scale video/volume analysis. In some instances, we enable processing on the restricted hardware model by re-introducing algorithms from early computer graphics research. On other occasions, we use newly discovered, hierarchical data structures to circumvent the random-access read/fixed write restriction that had previously kept sophisticated analysis algorithms from running solely on graphics hardware. For 3D processing, we apply known game graphics concepts such as mip-maps, projective texturing, and dependent texture lookups to show how video/volume processing can benefit algorithmically from being implemented in a graphics API. The novel GPU data structures provide drastically increased processing speed, and lift processing heavy operations to real-time performance levels, paving the way for new and interactive vision/graphics applications.Graphikhardware wurde in den letzen Jahren immer weiter programmierbar. Ihre APIs verwenden das Streamprozessor-Modell, um die massive Parallelisierung auch für den Programmierer verfügbar zu machen. Leider folgen aus dem strikten Streamprozessor-Modell, welches die GPU für ihre hohe Rechenleistung benötigt, auch Hindernisse in der Portierung von CPU-Algorithmen zur Video- und Volumenverarbeitung auf die GPU. Serielle Datenabhängigkeiten beschleunigen zwar CPU-Verarbeitung, sind aber für die daten-parallele GPU kontraproduktiv . Diese Arbeit präsentiert neue Herangehensweisen für bekannte Probleme der Video- und Volumensverarbeitung. Teilweise wird die Verarbeitung mit Hilfe von modifizierten Algorithmen aus der frühen Computergraphik-Forschung an das beschränkte Hardwaremodell angepasst. Anderswo helfen neu entdeckte, hierarchische Datenstrukturen beim Umgang mit den Schreibzugriff-Restriktionen die lange die Portierung von komplexeren Bildanalyseverfahren verhindert hatten. In der 3D-Verarbeitung nutzen wir bekannte Konzepte aus der Computerspielegraphik wie Mipmaps, projektive Texturierung, oder verkettete Texturzugriffe, und zeigen auf welche Vorteile die Video- und Volumenverarbeitung aus hardwarebeschleunigter Graphik-API-Implementation ziehen kann. Die präsentierten GPU-Datenstrukturen bieten drastisch schnellere Verarbeitung und heben rechenintensive Operationen auf Echtzeit-Niveau. Damit werden neue, interaktive Bildverarbeitungs- und Graphik-Anwendungen möglich

    Directional edge and texture representations for image processing

    Get PDF
    An efficient representation for natural images is of fundamental importance in image processing and analysis. The commonly used separable transforms such as wavelets axe not best suited for images due to their inability to exploit directional regularities such as edges and oriented textural patterns; while most of the recently proposed directional schemes cannot represent these two types of features in a unified transform. This thesis focuses on the development of directional representations for images which can capture both edges and textures in a multiresolution manner. The thesis first considers the problem of extracting linear features with the multiresolution Fourier transform (MFT). Based on a previous MFT-based linear feature model, the work extends the extraction method into the situation when the image is corrupted by noise. The problem is tackled by the combination of a "Signal+Noise" frequency model, a refinement stage and a robust classification scheme. As a result, the MFT is able to perform linear feature analysis on noisy images on which previous methods failed. A new set of transforms called the multiscale polar cosine transforms (MPCT) are also proposed in order to represent textures. The MPCT can be regarded as real-valued MFT with similar basis functions of oriented sinusoids. It is shown that the transform can represent textural patches more efficiently than the conventional Fourier basis. With a directional best cosine basis, the MPCT packet (MPCPT) is shown to be an efficient representation for edges and textures, despite its high computational burden. The problem of representing edges and textures in a fixed transform with less complexity is then considered. This is achieved by applying a Gaussian frequency filter, which matches the disperson of the magnitude spectrum, on the local MFT coefficients. This is particularly effective in denoising natural images, due to its ability to preserve both types of feature. Further improvements can be made by employing the information given by the linear feature extraction process in the filter's configuration. The denoising results compare favourably against other state-of-the-art directional representations
    corecore