13 research outputs found

    Analysis of distance metrics in content-based image retrieval using statistical quantized histogram texture features in the DCT domain

    Get PDF
    AbstractThe effective content-based image retrieval (CBIR) needs efficient extraction of low level features like color, texture and shapes for indexing and fast query image matching with indexed images for the retrieval of similar images. Features are extracted from images in pixel and compressed domains. However, now most of the existing images are in compressed formats like JPEG using DCT (discrete cosine transformation). In this paper we study the issues of efficient extraction of features and the effective matching of images in the compressed domain. In our method the quantized histogram statistical texture features are extracted from the DCT blocks of the image using the significant energy of the DC and the first three AC coefficients of the blocks. For the effective matching of the image with images, various distance metrics are used to measure similarities using texture features. The analysis of the effective CBIR is performed on the basis of various distance metrics in different number of quantization bins. The proposed method is tested by using Corel image database and the experimental results show that our method has robust image retrieval for various distance metrics with different histogram quantization in a compressed domain

    CONTENT-BASED IMAGE RETRIEVAL USING ENHANCED HYBRID METHODS WITH COLOR AND TEXTURE FEATURES

    Get PDF
    Content-based image retrieval (CBIR) automatically retrieves similar images to the query image by using the visual contents (features) of the image like color, texture and shape. Effective CBIR is based on efficient feature extraction for indexing and on effective query image matching with the indexed images for retrieval. However the main issue in CBIR is that how to extract the features efficiently because the efficient features describe well the image and they are used efficiently in matching of the images to get robust retrieval. This issue is the main inspiration for this thesis to develop a hybrid CBIR with high performance in the spatial and frequency domains. We propose various approaches, in which different techniques are fused to extract the statistical color and texture features efficiently in both domains. In spatial domain, the statistical color histogram features are computed using the pixel distribution of the Laplacian filtered sharpened images based on the different quantization schemes. However color histogram does not provide the spatial information. The solution is by using the histogram refinement method in which the statistical features of the regions in histogram bins of the filtered image are extracted but it leads to high computational cost, which is reduced by dividing the image into the sub-blocks of different sizes, to extract the color and texture features. To improve further the performance, color and texture features are combined using sub-blocks due to the less computational cos

    CONTENT-BASED IMAGE RETRIEVAL USING ENHANCED HYBRID METHODS WITH COLOR AND TEXTURE FEATURES

    Get PDF
    Content-based image retrieval (CBIR) automatically retrieves similar images to the query image by using the visual contents (features) of the image like color, texture and shape. Effective CBIR is based on efficient feature extraction for indexing and on effective query image matching with the indexed images for retrieval. However the main issue in CBIR is that how to extract the features efficiently because the efficient features describe well the image and they are used efficiently in matching of the images to get robust retrieval. This issue is the main inspiration for this thesis to develop a hybrid CBIR with high performance in the spatial and frequency domains. We propose various approaches, in which different techniques are fused to extract the statistical color and texture features efficiently in both domains. In spatial domain, the statistical color histogram features are computed using the pixel distribution of the Laplacian filtered sharpened images based on the different quantization schemes. However color histogram does not provide the spatial information. The solution is by using the histogram refinement method in which the statistical features of the regions in histogram bins of the filtered image are extracted but it leads to high computational cost, which is reduced by dividing the image into the sub-blocks of different sizes, to extract the color and texture features. To improve further the performance, color and texture features are combined using sub-blocks due to the less computational cos

    Irish Machine Vision and Image Processing Conference Proceedings 2017

    Get PDF

    Analysis for Scalable Coding of Quality-Adjustable Sensor Data

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 2. 신현식.Machine-generated data such as sensor data now comprise major portion of available information. This thesis addresses two important problems: storing of massive sensor data collection and efficient sensing. We first propose a quality-adjustable sensor data archiving, which compresses entire collection of sensor data efficiently without compromising key features. Considering the data aging aspect of sensor data, we make our archiving scheme capable of controlling data fidelity to exploit less frequent data access of user. This flexibility on quality adjustability leads to more efficient usage of storage space. In order to store data from various sensor types in cost-effective way, we study the optimal storage configuration strategy using analytical models that capture characteristics of our scheme. This strategy helps storing sensor data blocks with the optimal configurations that maximizes data fidelity of various sensor data under given storage space. Next, we consider efficient sensing schemes and propose a quality-adjustable sensing scheme. We adopt compressive sensing (CS) that is well suited for resource-limited sensors because of its low computational complexity. We enhance quality adjustability intrinsic to CS with quantization and especially temporal downsampling. Our sensing architecture provides more rate-distortion operating points than previous schemes, which enables sensors to adapt data quality in more efficient way considering overall performance. Moreover, the proposed temporal downsampling improves coding efficiency that is a drawback of CS. At the same time, the downsampling further reduces computational complexity of sensing devices, along with sparse random matrix. As a result, our quality-adjustable sensing can deliver gains to a wide variety of resource-constrained sensing techniques.Abstract i Contents iii List of Figures vi List of Tables x Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Spatio-Temporal Correlation in Sensor Data 3 1.3 Quality Adjustability of Sensor Data 7 1.4 Research Contributions 9 1.5 Thesis Organization 11 Chapter 2 Archiving of Sensor Data 12 2.1 Encoding Sensor Data Collection 12 2.1.1 Archiving Architecture 13 2.1.2 Data Conversion 16 2.2 Compression Ratio Comparison 20 2.3 Quality-Adjustable Archiving Model 25 2.3.1 Data Fidelity Model: Rate 25 2.3.2 Data Fidelity Model: Distortion 28 2.4 QP-Rate-Distortion Model 36 2.5 Optimal Rate Allocation 40 2.5.1 Rate Allocation Strategy 40 2.5.2 Optimal Storage Configuration 41 2.5.3 Experimental Results 44 Chapter 3 Scalable Management of Storage 46 3.1 Scalable Quality Management 46 3.1.1 Archiving Architecture 47 3.1.2 Compression Ratio Comparison 49 3.2 Enhancing Quality Adjustability 51 3.2.1 Data Fidelity Model: Rate 52 3.2.2 Data Fidelity Model: Distortion 55 3.3 Optimal Rate Allocation 59 3.3.1 Rate Allocation Strategy 60 3.3.2 Optimal Storage Configuration 63 3.3.3 Experimental Results 71 Chapter 4 Quality-Adjustable Sensing 73 4.1 Compressive Sensing 73 4.1.1 Compressive Sensing Problem 74 4.1.2 General Signal Recovery 76 4.1.3 Noisy Signal Recovery 76 4.2 Quality Adjustability in Sensing Environment 77 4.2.1 Quantization and Temporal Downsampling 79 4.2.2 Optimization with Error Model 85 4.3 Low-Complexity Sensing 88 4.3.1 Sparse Random Matrix 89 4.3.2 Resource Savings 92 Chapter 5 Conclusions 96 5.1 Summary 96 5.2 Future Research Directions 98 Bibliography 100 Abstract in Korean 109Docto

    Face recognition using statistical adapted local binary patterns.

    Get PDF
    Biometrics is the study of methods of recognizing humans based on their behavioral and physical characteristics or traits. Face recognition is one of the biometric modalities that received a great amount of attention from many researchers during the past few decades because of its potential applications in a variety of security domains. Face recognition however is not only concerned with recognizing human faces, but also with recognizing faces of non-biological entities or avatars. Fortunately, the need for secure and affordable virtual worlds is attracting the attention of many researchers who seek to find fast, automatic and reliable ways to identify virtual worlds’ avatars. In this work, I propose new techniques for recognizing avatar faces, which also can be applied to recognize human faces. Proposed methods are based mainly on a well-known and efficient local texture descriptor, Local Binary Pattern (LBP). I am applying different versions of LBP such as: Hierarchical Multi-scale Local Binary Patterns and Adaptive Local Binary Pattern with Directional Statistical Features in the wavelet space and discuss the effect of this application on the performance of each LBP version. In addition, I use a new version of LBP called Local Difference Pattern (LDP) with other well-known descriptors and classifiers to differentiate between human and avatar face images. The original LBP achieves high recognition rate if the tested images are pure but its performance gets worse if these images are corrupted by noise. To deal with this problem I propose a new definition to the original LBP in which the LBP descriptor will not threshold all the neighborhood pixel based on the central pixel value. A weight for each pixel in the neighborhood will be computed, a new value for each pixel will be calculated and then using simple statistical operations will be used to compute the new threshold, which will change automatically, based on the pixel’s values. This threshold can be applied with the original LBP or any other version of LBP and can be extended to work with Local Ternary Pattern (LTP) or any version of LTP to produce different versions of LTP for recognizing noisy avatar and human faces images

    Machine learning techniques for music information retrieval

    Get PDF
    Tese de doutoramento, Informática (Engenharia Informática), Universidade de Lisboa, Faculdade de Ciências, 2015The advent of digital music has changed the rules of music consumption, distribution and sales. With it has emerged the need to effectively search and manage vast music collections. Music information retrieval is an interdisciplinary field of research that focuses on the development of new techniques with that aim in mind. This dissertation addresses a specific aspect of this field: methods that automatically extract musical information exclusively based on the audio signal. We propose a method for automatic music-based classification, label inference, and music similarity estimation. Our method consist in representing the audio with a finite set of symbols and then modeling the symbols time evolution. The symbols are obtained via vector quantization in which a single codebook is used to quantize the audio descriptors. The symbols time evolution is modeled via a first order Markov process. Based on systematic evaluations we carried out on publicly available sets, we show that our method achieves performances on par with most techniques found in literature. We also present and discuss the problems that appear when computers try to classify or annotate songs using the audio as the only source of information. In our method, the separation of quantization process from the creation and training of classification models helped us in that analysis. It enabled us to examine how instantaneous sound attributes (henceforth features) are distributed in term of musical genre, and how designing codebooks specially tailored for these distributions affects the performance of ours and other classification systems commonly used for this task. On this issue, we show that there is no apparent benefit in seeking a thorough representation of the feature space. This is a bit unexpected since it goes against the assumption that features carry equally relevant information loads and somehow capture the specificities of musical facets, implicit in many genre recognition methods. Label inference is the task of automatically annotating songs with semantic words - this tasks is also known as autotagging. In this context, we illustrate the importance of a number of issues, that in our perspective, are often overlooked. We show that current techniques are fragile in the sense that small alterations in the set of labels may lead to dramatically different results. Furthermore, through a series of experiments, we show that autotagging systems fail to learn tag models capable to generalize to datasets of different origins. We also show that the performance achieved with these techniques is not sufficient to be able to take advantage of the correlations between tags.Fundação para a Ciência e a Tecnologia (FCT

    Compression of 3D models with NURBS

    Get PDF
    With recent progress in computing, algorithmics and telecommunications, 3D models are increasingly used in various multimedia applications. Examples include visualization, gaming, entertainment and virtual reality. In the multimedia domain 3D models have been traditionally represented as polygonal meshes. This piecewise planar representation can be thought of as the analogy of bitmap images for 3D surfaces. As bitmap images, they enjoy great flexibility and are particularly well suited to describing information captured from the real world, through, for instance, scanning processes. They suffer, however, from the same shortcomings, namely limited resolution and large storage size. The compression of polygonal meshes has been a very active field of research in the last decade and rather efficient compression algorithms have been proposed in the literature that greatly mitigate the high storage costs. However, such a low level description of a 3D shape has a bounded performance. More efficient compression should be reachable through the use of higher level primitives. This idea has been explored to a great extent in the context of model based coding of visual information. In such an approach, when compressing the visual information a higher level representation (e.g., 3D model of a talking head) is obtained through analysis methods. This can be seen as an inverse projection problem. Once this task is fullled, the resulting parameters of the model are coded instead of the original information. It is believed that if the analysis module is efficient enough, the total cost of coding (in a rate distortion sense) will be greatly reduced. The relatively poor performance and high complexity of currently available analysis methods (except for specific cases where a priori knowledge about the nature of the objects is available), has refrained a large deployment of coding techniques based on such an approach. Progress in computer graphics has however changed this situation. In fact, nowadays, an increasing number of pictures, video and 3D content are generated by synthesis processing rather than coming from a capture device such as a camera or a scanner. This means that the underlying model in the synthesis stage can be used for their efficient coding without the need for a complex analysis module. In other words it would be a mistake to attempt to compress a low level description (e.g., a polygonal mesh) when a higher level one is available from the synthesis process (e.g., a parametric surface). This is, however, what is usually done in the multimedia domain, where higher level 3D model descriptions are converted to polygonal meshes, if anything by the lack of standard coded formats for the former. On a parallel but related path, the way we consume audio-visual information is changing. As opposed to recent past and a large part of today's applications, interactivity is becoming a key element in the way we consume information. In the context of interest in this dissertation, this means that when coding visual information (an image or a video for instance), previously obvious considerations such as decision on sampling parameters are not so obvious anymore. In fact, as in an interactive environment the effective display resolution can be controlled by the user through zooming, there is no clear optimal setting for the sampling period. This means that because of interactivity, the representation used to code the scene should allow the display of objects in a variety of resolutions, and ideally up to infinity. One way to resolve this problem would be by extensive over-sampling. But this approach is unrealistic and too expensive to implement in many situations. The alternative would be to use a resolution independent representation. In the realm of 3D modeling, such representations are usually available when the models are created by an artist on a computer. The scope of this dissertation is precisely the compression of 3D models in higher level forms. The direct coding in such a form should yield improved rate-distortion performance while providing a large degree of resolution independence. There has not been, so far, any major attempt to efficiently compress these representations, such as parametric surfaces. This thesis proposes a solution to overcome this gap. A variety of higher level 3D representations exist, of which parametric surfaces are a popular choice among designers. Within parametric surfaces, Non-Uniform Rational B-Splines (NURBS) enjoy great popularity as a wide range of NURBS based modeling tools are readily available. Recently, NURBS has been included in the Virtual Reality Modeling Language (VRML) and its next generation descendant eXtensible 3D (X3D). The nice properties of NURBS and their widespread use has lead us to choose them as the form we use for the coded representation. The primary goal of this dissertation is the definition of a system for coding 3D NURBS models with guaranteed distortion. The basis of the system is entropy coded differential pulse coded modulation (DPCM). In the case of NURBS, guaranteeing the distortion is not trivial, as some of its parameters (e.g., knots) have a complicated influence on the overall surface distortion. To this end, a detailed distortion analysis is performed. In particular, previously unknown relations between the distortion of knots and the resulting surface distortion are demonstrated. Compression efficiency is pursued at every stage and simple yet efficient entropy coder realizations are defined. The special case of degenerate and closed surfaces with duplicate control points is addressed and an efficient yet simple coding is proposed to compress the duplicate relationships. Encoder aspects are also analyzed. Optimal predictors are found that perform well across a wide class of models. Simplification techniques are also considered for improved compression efficiency at negligible distortion cost. Transmission over error prone channels is also considered and an error resilient extension defined. The data stream is partitioned by independently coding small groups of surfaces and inserting the necessary resynchronization markers. Simple strategies for achieving the desired level of protection are proposed. The same extension also serves the purpose of random access and on-the-fly reordering of the data stream

    Exploring information retrieval using image sparse representations:from circuit designs and acquisition processes to specific reconstruction algorithms

    Get PDF
    New advances in the field of image sensors (especially in CMOS technology) tend to question the conventional methods used to acquire the image. Compressive Sensing (CS) plays a major role in this, especially to unclog the Analog to Digital Converters which are generally representing the bottleneck of this type of sensors. In addition, CS eliminates traditional compression processing stages that are performed by embedded digital signal processors dedicated to this purpose. The interest is twofold because it allows both to consistently reduce the amount of data to be converted but also to suppress digital processing performed out of the sensor chip. For the moment, regarding the use of CS in image sensors, the main route of exploration as well as the intended applications aims at reducing power consumption related to these components (i.e. ADC & DSP represent 99% of the total power consumption). More broadly, the paradigm of CS allows to question or at least to extend the Nyquist-Shannon sampling theory. This thesis shows developments in the field of image sensors demonstrating that is possible to consider alternative applications linked to CS. Indeed, advances are presented in the fields of hyperspectral imaging, super-resolution, high dynamic range, high speed and non-uniform sampling. In particular, three research axes have been deepened, aiming to design proper architectures and acquisition processes with their associated reconstruction techniques taking advantage of image sparse representations. How the on-chip implementation of Compressed Sensing can relax sensor constraints, improving the acquisition characteristics (speed, dynamic range, power consumption) ? How CS can be combined with simple analysis to provide useful image features for high level applications (adding semantic information) and improve the reconstructed image quality at a certain compression ratio ? Finally, how CS can improve physical limitations (i.e. spectral sensitivity and pixel pitch) of imaging systems without a major impact neither on the sensing strategy nor on the optical elements involved ? A CMOS image sensor has been developed and manufactured during this Ph.D. to validate concepts such as the High Dynamic Range - CS. A new design approach was employed resulting in innovative solutions for pixels addressing and conversion to perform specific acquisition in a compressed mode. On the other hand, the principle of adaptive CS combined with the non-uniform sampling has been developed. Possible implementations of this type of acquisition are proposed. Finally, preliminary works are exhibited on the use of Liquid Crystal Devices to allow hyperspectral imaging combined with spatial super-resolution. The conclusion of this study can be summarized as follows: CS must now be considered as a toolbox for defining more easily compromises between the different characteristics of the sensors: integration time, converters speed, dynamic range, resolution and digital processing resources. However, if CS relaxes some material constraints at the sensor level, it is possible that the collected data are difficult to interpret and process at the decoder side, involving massive computational resources compared to so-called conventional techniques. The application field is wide, implying that for a targeted application, an accurate characterization of the constraints concerning both the sensor (encoder), but also the decoder need to be defined

    Distortion-constraint compression of three-dimensional CLSM images using image pyramid and vector quantization

    Get PDF
    The confocal microscopy imaging techniques, which allow optical sectioning, have been successfully exploited in biomedical studies. Biomedical scientists can benefit from more realistic visualization and much more accurate diagnosis by processing and analysing on a three-dimensional image data. The lack of efficient image compression standards makes such large volumetric image data slow to transfer over limited bandwidth networks. It also imposes large storage space requirements and high cost in archiving and maintenance. Conventional two-dimensional image coders do not take into account inter-frame correlations in three-dimensional image data. The standard multi-frame coders, like video coders, although they have good performance in capturing motion information, are not efficiently designed for coding multiple frames representing a stack of optical planes of a real object. Therefore a real three-dimensional image compression approach should be investigated. Moreover the reconstructed image quality is a very important concern in compressing medical images, because it could be directly related to the diagnosis accuracy. Most of the state-of-the-arts methods are based on transform coding, for instance JPEG is based on discrete-cosine-transform CDCT) and JPEG2000 is based on discrete- wavelet-transform (DWT). However in DCT and DWT methods, the control of the reconstructed image quality is inconvenient, involving considerable costs in computation, since they are fundamentally rate-parameterized methods rather than distortion-parameterized methods. Therefore it is very desirable to develop a transform-based distortion-parameterized compression method, which is expected to have high coding performance and also able to conveniently and accurately control the final distortion according to the user specified quality requirement. This thesis describes our work in developing a distortion-constraint three-dimensional image compression approach, using vector quantization techniques combined with image pyramid structures. We are expecting our method to have: 1. High coding performance in compressing three-dimensional microscopic image data, compared to the state-of-the-art three-dimensional image coders and other standardized two-dimensional image coders and video coders. 2. Distortion-control capability, which is a very desirable feature in medical 2. Distortion-control capability, which is a very desirable feature in medical image compression applications, is superior to the rate-parameterized methods in achieving a user specified quality requirement. The result is a three-dimensional image compression method, which has outstanding compression performance, measured objectively, for volumetric microscopic images. The distortion-constraint feature, by which users can expect to achieve a target image quality rather than the compressed file size, offers more flexible control of the reconstructed image quality than its rate-constraint counterparts in medical image applications. Additionally, it effectively reduces the artifacts presented in other approaches at low bit rates and also attenuates noise in the pre-compressed images. Furthermore, its advantages in progressive transmission and fast decoding make it suitable for bandwidth limited tele-communications and web-based image browsing applications
    corecore