12 research outputs found

    Box Spline Wavelet Frames for Image Edge Analysis

    Get PDF
    We present a new box spline wavelet frame and apply it for image edge analysis. The wavelet frame is constructed using a box spline of eight directions. It is tight and has seldom been used for applications. Due to the eight different directions, it can find edges of various types in detail quite well. In addition to step edges (local discontinuities in intensity), it is able to locate Dirac edges (momentary changes of intensity) and hidden edges (local discontinuity in intensity derivatives). The method is simple and robust to noise. Many numerical examples are presented to demonstrate the effectiveness of this method. Quantitative and qualitative comparisons with other edge detection techniques are provided to show the advantages of this wavelet frame. Our test images include synthetic images with known ground truth and natural, medical images with rich geometric information

    Directional edge and texture representations for image processing

    Get PDF
    An efficient representation for natural images is of fundamental importance in image processing and analysis. The commonly used separable transforms such as wavelets axe not best suited for images due to their inability to exploit directional regularities such as edges and oriented textural patterns; while most of the recently proposed directional schemes cannot represent these two types of features in a unified transform. This thesis focuses on the development of directional representations for images which can capture both edges and textures in a multiresolution manner. The thesis first considers the problem of extracting linear features with the multiresolution Fourier transform (MFT). Based on a previous MFT-based linear feature model, the work extends the extraction method into the situation when the image is corrupted by noise. The problem is tackled by the combination of a "Signal+Noise" frequency model, a refinement stage and a robust classification scheme. As a result, the MFT is able to perform linear feature analysis on noisy images on which previous methods failed. A new set of transforms called the multiscale polar cosine transforms (MPCT) are also proposed in order to represent textures. The MPCT can be regarded as real-valued MFT with similar basis functions of oriented sinusoids. It is shown that the transform can represent textural patches more efficiently than the conventional Fourier basis. With a directional best cosine basis, the MPCT packet (MPCPT) is shown to be an efficient representation for edges and textures, despite its high computational burden. The problem of representing edges and textures in a fixed transform with less complexity is then considered. This is achieved by applying a Gaussian frequency filter, which matches the disperson of the magnitude spectrum, on the local MFT coefficients. This is particularly effective in denoising natural images, due to its ability to preserve both types of feature. Further improvements can be made by employing the information given by the linear feature extraction process in the filter's configuration. The denoising results compare favourably against other state-of-the-art directional representations

    Digital Painting Analysis:Authentication and Artistic Style from Digital Reproductions

    Get PDF

    Cross View Action Recognition

    Get PDF
    openCross View Action Recognition (CVAR) appraises a system's ability to recognise actions from viewpoints that are unfamiliar to the system. The state of the art methods that train on large amounts of training data rely on variation in the training data itself to increase their ability to tackle viewpoints changes. Therefore, these methods not only require a large scale dataset of appropriate classes for the application every time they train, but also correspondingly large amount of computation power for the training process leading to high costs, in terms of time, effort, funds and electrical energy. In this thesis, we propose a methodological pipeline that tackles change in viewpoint, training on small datasets and employing sustainable amounts of resources. Our method uses the optical flow input with a stream of a pre-trained model as-is to obtain a feature. Thereafter, this feature is used to train a custom designed classifier that promotes view-invariant properties. Our method only uses video information as input, in contrast to another set of methods that approach CVAR by using depth or pose input at the expense of increased sensor costs. We present a number of comparative analysis that aided the design of the pipelines, farther assessing the power of each component in the pipeline. The technique can also be adopted to existing, trained classifiers, with minimal fine-tuning, as this work demonstrates by comparing classifiers including shallow classifiers, deep pre-trained classifiers and our proposed classifier trained from scratch. Additionally, we present a set of qualitative results that promote our understanding of the relationship between viewpoints in the feature-space.openXXXII CICLO - INFORMATICA E INGEGNERIA DEI SISTEMI/ COMPUTER SCIENCE AND SYSTEMS ENGINEERING - InformaticaGoyal, Gaurv

    DESIGN OF COMPACT AND DISCRIMINATIVE DICTIONARIES

    Get PDF
    The objective of this research work is to design compact and discriminative dictionaries for e�ective classi�cation. The motivation stems from the fact that dictionaries inherently contain redundant dictionary atoms. This is because the aim of dictionary learning is reconstruction, not classi�cation. In this thesis, we propose methods to obtain minimum number discriminative dictionary atoms for e�ective classi�cation and also reduced computational time. First, we propose a classi�cation scheme where an example is assigned to a class based on the weight assigned to both maximum projection and minimum reconstruction error. Here, the input data is learned by K-SVD dictionary learning which alternates between sparse coding and dictionary update. For sparse coding, orthogonal matching pursuit (OMP) is used and for dictionary update, singular value decomposition is used. This way of classi�cation though e�ective, still there is a scope to improve dictionary learning by removing redundant atoms because our goal is not reconstruction. In order to remove such redundant atoms, we propose two approaches based on information theory to obtain compact discriminative dictionaries. In the �rst approach, we remove redundant atoms from the dictionary while maintaining discriminative information. Speci�cally, we propose a constraint optimization problem which minimizes the mutual information between optimized dictionary and initial dictionary while maximizing mutual information between class labels and optimized dictionary. This helps to determine information loss between before and after the dictionary optimization. To compute information loss, we use Jensen-Shannon diver- gence with adaptive weights to compare class distributions of each dictionary atom. The advantage of Jensen-Shannon divergence is its computational e�ciency rather than calculating information loss from mutual information

    Generalized Rate-Distortion Functions of Videos

    Get PDF
    Customers are consuming enormous digital videos every day via various kinds of video services through terrestrial, cable, and satellite communication systems or over-the-top Internet connections. To offer the best possible services using the limited capacity of video distribution systems, these video services desire precise understanding of the relationship between the perceptual quality of a video and its media attributes, for which we term it the GRD function. In this thesis, we focus on accurately estimating the generalized rate-distortion (GRD) function with a minimal number of measurement queries. We first explore the GRD behavior of compressed digital videos in a two-dimensional space of bitrate and resolution. Our analysis on real-world GRD data reveals that all GRD functions share similar regularities, but meanwhile exhibit considerable variations across different combinations of content and encoder types. Based on the analysis, we define the theoretical space of the GRD function, which not only constructs the groundwork of the form a GRD model should take, but also determines the constraints these functions must satisfy. We propose two computational GRD models. In the first model, we assume that the quality scores are precise, and develop a robust axial-monotonic Clough-Tocher (RAMCT) interpolation method to approximate the GRD function from a moderate number of measurements. In the second model, we show that the GRD function space is a convex set residing in a Hilbert space, and that a GRD function can be estimated by solving a projection problem onto the convex set. By analyzing GRD functions that arise in practice, we approximate the infinite-dimensional theoretical space by a low-dimensional one, based on which an empirical GRD model of few parameters is proposed. To further reduce the number of queries, we present a novel sampling scheme based on a probabilistic model and an information measure. The proposed sampling method generates a sequence of queries by minimizing the overall informativeness of the remaining samples. To evaluate the performance of the GRD estimation methods, we collect a large-scale database consisting of more than 4,0004,000 real-world GRD functions, namely the Waterloo generalized rate-distortion (Waterloo GRD) database. Extensive comparison experiments are carried out on the database. Superiority of the two proposed GRD models over state-of-the-art approaches are attested both quantitatively and visually. Meanwhile, it is also validated that the proposed sampling algorithm consistently reduces the number of queries needed by various GRD estimation algorithms. Finally, we show the broad application scope of the proposed GRD models by exemplifying three applications: rate-distortion curve prediction, per-title encoding profile generation, and video encoder comparison
    corecore