38 research outputs found

    An efficient psychovisual threshold technique in image compression

    Get PDF
    Nowadays, psychovisual model plays a critical role in an image compression system. The psychovisual threshold gives visual tolerance to the human visual system by reducing the amount of frequency image signals. The sensitivity of the human eye can be fully explored and exploited in the qualitative experiment by describing what has been seen or by image quality judgment. However, the result of the psychovisual threshold through qualitative experiment depends on the test condition of the human visual systems and through repetitive viewing sessions. In a modern image compression, there is a need to provide some flexibility to obtain quality levels of the image output based on user preferences. The concept of psychovisual threshold is designed to determine quality levels of the image output. The psychovisual threshold represents an optimal amount of frequency image signals in image compression. This research proposes the psychovisual threshold through a quantitative experiment that can automatically predict an optimal balance between image quality and compression rate in image compression. The contribution of its frequency image signals to the image reconstruction will be the primitive of psychovisual threshold in image compression. It is very challenging to develop a psychovisual threshold from the contribution of the frequency image signals for each frequency order. In this research, the psychovisual threshold prescribes the quantization values and bit allocation for image compression. The psychovisual threshold is the basic primitive prior to generating quantization tables in image compression. The psychovisual threshold allows a developer to design adaptively customized quantization values according to his or her target image quality. The psychovisual threshold is also elementary and primitive for generating a set of bit allocation for frequency image signals. A set of bit allocation based on psychovisual threshold assigns the amount of bits for frequency image signals. A set of bit allocation refers to the psychovisual threshold instead of the quantization process in image compression. This research investigates the basic understanding of the psychovisual threshold in image compression. The experimental results provide significant improvement in the image compression. The psychovisual threshold which is presented as quantization tables, customized quantization tables and as a set of bit allocation gives a significant improvement on both of the quality of the image reconstruction and the average bit length of Huffman code. This research shows that psychovisual threshold is practically the best measure for optimal frequency image signals on image compression

    Psychovisual Threshold On Large Tchebichef Moment For Image Compression

    Get PDF
    JPEG standard transforms an 8×8 image pixel into a requency domain. The discontinuities of the intensity image between adjacent image blocks cause the visual artifacts due to inter-block correlations in image reconstruction. The blocking artifacts appear by the pixel intensity value discontinuities which occur along block boundaries. This research proposes the psychovisual threshold on large Tchebichef moment to minimize the blocking artifacts. The psychovisual threshold is practically the best measure for the optimal amount of frequency image signals in the image coding. The psychovisual threshold is a basic element prior to generating quantization tables in image compression. The psychovisual threshold on the large Tchebichef moments has given significant improvements in the quality of image output. The experimental results show that the smooth psychovisual threshold on the large discrete Tchebichef moment produces high quality image output and largely free of any visual artifacts

    VHDL design and simulation for embedded zerotree wavelet quantisation

    Get PDF
    This thesis discusses a highly effective still image compression algorithm – The Embedded Zerotree Wavelets coding technique, as it is called. This technique is simple but achieves a remarkable result. The image is wavelet-transformed, symbolically coded and successive quantised, therefore the compression and transmission/storage saving can be achieved by utilising the structure of zerotree. The algorithm was first proposed by Jerome M. Shapiro in 1993, however to minimise the memory usage and speeding up the EZW processor, a Depth First Search method is used to transverse across the image rather than Breadth First Search method as initially discussed in Shapiro\u27s paper (Shapiro, 1993). The project\u27s primary objective is to simulate the EZW algorithm from a basic building block of 8 by 8 matrix to a well-known reference image such Lenna of 256 by 256 matrix. Hence the algorithm performance can be measured, for instance its peak signal to noise ratio can be calculated. The software environment used for the simulation is a Very-High Speed Integrated Circuits - Hardware Description Language such Peak VHDL, PC based version. This will lead to the second phase of the project. The secondary objective is to test the algorithm at a hardware level, such FPGA for a rapid prototype implementation only if the project time permits

    Matrix Transform Imager Architecture for On-Chip Low-Power Image Processing

    Get PDF
    Camera-on-a-chip systems have tried to include carefully chosen signal processing units for better functionality, performance and also to broaden the applications they can be used for. Image processing sensors have been possible due advances in CMOS active pixel sensors (APS) and neuromorphic focal plane imagers. Some of the advantages of these systems are compact size, high speed and parallelism, low power dissipation, and dense system integration. One can envision using these chips for portable and inexpensive video cameras on hand-held devices like personal digital assistants (PDA) or cell-phones In neuromorphic modeling of the retina it would be very nice to have processing capabilities at the focal plane while retaining the density of typical APS imager designs. Unfortunately, these two goals have been mostly incompatible. We introduce our MAtrix Transform Imager Architecture (MATIA) that uses analog floating--gate devices to make it possible to have computational imagers with high pixel densities. The core imager performs computations at the pixel plane, but still has a fill-factor of 46 percent - comparable to the high fill-factors of APS imagers. The processing is performed continuously on the image via programmable matrix operations that can operate on the entire image or blocks within the image. The resulting data-flow architecture can directly perform all kinds of block matrix image transforms. Since the imager operates in the subthreshold region and thus has low power consumption, this architecture can be used as a low-power front end for any system that utilizes these computations. Various compression algorithms (e.g. JPEG), that use block matrix transforms, can be implemented using this architecture. Since MATIA can be used for gradient computations, cheap image tracking devices can be implemented using this architecture. Other applications of this architecture can range from stand-alone universal transform imager systems to systems that can compute stereoscopic depth.Ph.D.Committee Chair: Hasler, Paul; Committee Member: David Anderson; Committee Member: DeWeerth, Steve; Committee Member: Jackson, Joel; Committee Member: Smith, Mar

    A motion-based approach for audio-visual automatic speech recognition

    Get PDF
    The research work presented in this thesis introduces novel approaches for both visual region of interest extraction and visual feature extraction for use in audio-visual automatic speech recognition. In particular, the speaker‘s movement that occurs during speech is used to isolate the mouth region in video sequences and motionbased features obtained from this region are used to provide new visual features for audio-visual automatic speech recognition. The mouth region extraction approach proposed in this work is shown to give superior performance compared with existing colour-based lip segmentation methods. The new features are obtained from three separate representations of motion in the region of interest, namely the difference in luminance between successive images, block matching based motion vectors and optical flow. The new visual features are found to improve visual-only and audiovisual speech recognition performance when compared with the commonly-used appearance feature-based methods. In addition, a novel approach is proposed for visual feature extraction from either the discrete cosine transform or discrete wavelet transform representations of the mouth region of the speaker. In this work, the image transform is explored from a new viewpoint of data discrimination; in contrast to the more conventional data preservation viewpoint. The main findings of this work are that audio-visual automatic speech recognition systems using the new features extracted from the frequency bands selected according to their discriminatory abilities generally outperform those using features designed for data preservation. To establish the noise robustness of the new features proposed in this work, their performance has been studied in presence of a range of different types of noise and at various signal-to-noise ratios. In these experiments, the audio-visual automatic speech recognition systems based on the new approaches were found to give superior performance both to audio-visual systems using appearance based features and to audio-only speech recognition systems

    Perceptual modelling for 2D and 3D

    Get PDF
    Livrable D1.1 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D1.1 du projet

    Semi-automatic video object segmentation for multimedia applications

    Get PDF
    A semi-automatic video object segmentation tool is presented for segmenting both still pictures and image sequences. The approach comprises both automatic segmentation algorithms and manual user interaction. The still image segmentation component is comprised of a conventional spatial segmentation algorithm (Recursive Shortest Spanning Tree (RSST)), a hierarchical segmentation representation method (Binary Partition Tree (BPT)), and user interaction. An initial segmentation partition of homogeneous regions is created using RSST. The BPT technique is then used to merge these regions and hierarchically represent the segmentation in a binary tree. The semantic objects are then manually built by selectively clicking on image regions. A video object-tracking component enables image sequence segmentation, and this subsystem is based on motion estimation, spatial segmentation, object projection, region classification, and user interaction. The motion between the previous frame and the current frame is estimated, and the previous object is then projected onto the current partition. A region classification technique is used to determine which regions in the current partition belong to the projected object. User interaction is allowed for object re-initialisation when the segmentation results become inaccurate. The combination of all these components enables offline video sequence segmentation. The results presented on standard test sequences illustrate the potential use of this system for object-based coding and representation of multimedia

    Scene-Dependency of Spatial Image Quality Metrics

    Get PDF
    This thesis is concerned with the measurement of spatial imaging performance and the modelling of spatial image quality in digital capturing systems. Spatial imaging performance and image quality relate to the objective and subjective reproduction of luminance contrast signals by the system, respectively; they are critical to overall perceived image quality. The Modulation Transfer Function (MTF) and Noise Power Spectrum (NPS) describe the signal (contrast) transfer and noise characteristics of a system, respectively, with respect to spatial frequency. They are both, strictly speaking, only applicable to linear systems since they are founded upon linear system theory. Many contemporary capture systems use adaptive image signal processing, such as denoising and sharpening, to optimise output image quality. These non-linear processes change their behaviour according to characteristics of the input signal (i.e. the scene being captured). This behaviour renders system performance “scene-dependent” and difficult to measure accurately. The MTF and NPS are traditionally measured from test charts containing suitable predefined signals (e.g. edges, sinusoidal exposures, noise or uniform luminance patches). These signals trigger adaptive processes at uncharacteristic levels since they are unrepresentative of natural scene content. Thus, for systems using adaptive processes, the resultant MTFs and NPSs are not representative of performance “in the field” (i.e. capturing real scenes). Spatial image quality metrics for capturing systems aim to predict the relationship between MTF and NPS measurements and subjective ratings of image quality. They cascade both measures with contrast sensitivity functions that describe human visual sensitivity with respect to spatial frequency. The most recent metrics designed for adaptive systems use MTFs measured using the dead leaves test chart that is more representative of natural scene content than the abovementioned test charts. This marks a step toward modelling image quality with respect to real scene signals. This thesis presents novel scene-and-process-dependent MTFs (SPD-MTF) and NPSs (SPDNPS). They are measured from imaged pictorial scene (or dead leaves target) signals to account for system scene-dependency. Further, a number of spatial image quality metrics are revised to account for capture system and visual scene-dependency. Their MTF and NPS parameters were substituted for SPD-MTFs and SPD-NPSs. Likewise, their standard visual functions were substituted for contextual detection (cCSF) or discrimination (cVPF) functions. In addition, two novel spatial image quality metrics are presented (the log Noise Equivalent Quanta (NEQ) and Visual log NEQ) that implement SPD-MTFs and SPD-NPSs. The metrics, SPD-MTFs and SPD-NPSs were validated by analysing measurements from simulated image capture pipelines that applied either linear or adaptive image signal processing. The SPD-NPS measures displayed little evidence of measurement error, and the metrics performed most accurately when they used SPD-NPSs measured from images of scenes. The benefit of deriving SPD-MTFs from images of scenes was traded-off, however, against measurement bias. Most metrics performed most accurately with SPD-MTFs derived from dead leaves signals. Implementing the cCSF or cVPF did not increase metric accuracy. The log NEQ and Visual log NEQ metrics proposed in this thesis were highly competitive, outperforming metrics of the same genre. They were also more consistent than the IEEE P1858 Camera Phone Image Quality (CPIQ) metric when their input parameters were modified. The advantages and limitations of all performance measures and metrics were discussed, as well as their practical implementation and relevant applications
    corecore