183 research outputs found

    Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning

    Full text link
    The success of deep learning in computer vision is rooted in the ability of deep networks to scale up model complexity as demanded by challenging visual tasks. As complexity is increased, so is the need for large amounts of labeled data to train the model. This is associated with a costly human annotation effort. To address this concern, with the long-term goal of leveraging the abundance of cheap unlabeled data, we explore methods of unsupervised "pre-training." In particular, we propose to use self-supervised automatic image colorization. We show that traditional methods for unsupervised learning, such as layer-wise clustering or autoencoders, remain inferior to supervised pre-training. In search for an alternative, we develop a fully automatic image colorization method. Our method sets a new state-of-the-art in revitalizing old black-and-white photography, without requiring human effort or expertise. Additionally, it gives us a method for self-supervised representation learning. In order for the model to appropriately re-color a grayscale object, it must first be able to identify it. This ability, learned entirely self-supervised, can be used to improve other visual tasks, such as classification and semantic segmentation. As a future direction for self-supervision, we investigate if multiple proxy tasks can be combined to improve generalization. This turns out to be a challenging open problem. We hope that our contributions to this endeavor will provide a foundation for future efforts in making self-supervision compete with supervised pre-training.Comment: Ph.D. thesi

    Graph Spectral Image Processing

    Full text link
    Recent advent of graph signal processing (GSP) has spurred intensive studies of signals that live naturally on irregular data kernels described by graphs (e.g., social networks, wireless sensor networks). Though a digital image contains pixels that reside on a regularly sampled 2D grid, if one can design an appropriate underlying graph connecting pixels with weights that reflect the image structure, then one can interpret the image (or image patch) as a signal on a graph, and apply GSP tools for processing and analysis of the signal in graph spectral domain. In this article, we overview recent graph spectral techniques in GSP specifically for image / video processing. The topics covered include image compression, image restoration, image filtering and image segmentation

    Modified Golomb-Rice Algorithm for Color Image Compression

    Get PDF
    The memory required to store the color image is more. We have reduced the memory requirements using Golomb-rice algorithm. Golomb-rice algorithm consists of the following two steps. In Golomb-Rice algorithm the first step is to compress the image using discrete wavelet transform. By using DWT compression the 8 × 8 image is converted into m × n sub-windows and it is converted into raster file format for producing m × n-1 differential data. Encoding is done by using Golomb-Rice coding.  After encoding, the process length, code word and size are calculated by using GR coding.In the second step decoding is done by GR coding based on the obtained length and code word. After that decoded image is decompressed in order to get the original image by using the inverse discrete wavelet transform.&nbsp

    Artificial Intelligence in the Creative Industries: A Review

    Full text link
    This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity

    Ehdolliset normalisoivat virtaukset kuvien käänteisongelmissa

    Get PDF
    Learning-based methods have provided powerful tools for solving classification and regression -related problems yielding far superior results to classical handcrafted rule-based models. These models have proven to be efficient in multiple domains in many different fields. However, many common problems are inherently illposed and lack a unique answer hence requiring a regularization pass or alternatively a probabilistic framework for successful modeling. While many different families of models capable of learning distributions given samples exist, they commonly resort to approximations or surrogate training objectives. In this thesis we solve image-related inverse problems with a family of probabilistic models known as conditional normalizing flows. A normalizing flow consists of repeated applications of invertible transformations on a simple prior distribution rendering it into a more complex distribution with direct and tractable probability density evaluation and efficient sampling. We show that a conditional normalizing flow is able to provide plausible, high-quality samples with visible benign variance from a conditional distribution in image super resolution, denoising and colorization tasks. We quantify the success of the model as well as its shortcomings and inspect how it internally addresses the conversion of white noise into a realistic image.Havainnoista oppimiseen optimoinnin avulla perustuvat mallit kykenevät ratkaisemaan monia ongelmia huomattavasti tehokkaammin, kuin klassiset staattisiin päätössääntöihin perustuvat mallit. Perinteisesti mallit antavat yleensä kuitenkin vain yhden vastauksen, vaikka useilla ongelmilla saattaa olla monta keskenään yhtä hyväksyttävää vastausta. Tämän takia on tarkoituksenmukaista mallintaa todennäköisyysjakaumaa kaikista mahdollisista vastauksista yksittäisen vastauksen sijaan. Tässä diplomityössä tutkitaan normalisoivien virtausten malliluokan soveltamista digitaalisiin kuviin liittyviin käänteisongelmiin. Normalisoiva virtaus muuntaa yksinkertaisen todennäköisyysjakauman neuroverkoilla parametrosoiduilla kääntyvillä funktioilla monimutkaisemmaksi jakaumaksi, siten että havaintojen uskottavuudesta saadaan kuitenkin tarkka numeerinen arvo. Normalisoivat virtaukset mahdollistavat myös tehokkaan näytteiden ottamisen niiden mallintamasta monimutkaisesta todennäköisyysjakaumasta. Työssä määritetään, kuinka hyvin virtausmallit onnistuvat tehtävässään ja kuinka ne muodostavat uskottavia kuvia kohinasta. Työssä todetaan, että ehdollisten normalisoivien virtausten avulla voidaan tuottaa korkealaatuisia näytteitä useissa kuviin liittyvissä käänteisongelmissa

    IST Austria Thesis

    Get PDF
    Modern computer vision systems heavily rely on statistical machine learning models, which typically require large amounts of labeled data to be learned reliably. Moreover, very recently computer vision research widely adopted techniques for representation learning, which further increase the demand for labeled data. However, for many important practical problems there is relatively small amount of labeled data available, so it is problematic to leverage full potential of the representation learning methods. One way to overcome this obstacle is to invest substantial resources into producing large labelled datasets. Unfortunately, this can be prohibitively expensive in practice. In this thesis we focus on the alternative way of tackling the aforementioned issue. We concentrate on methods, which make use of weakly-labeled or even unlabeled data. Specifically, the first half of the thesis is dedicated to the semantic image segmentation task. We develop a technique, which achieves competitive segmentation performance and only requires annotations in a form of global image-level labels instead of dense segmentation masks. Subsequently, we present a new methodology, which further improves segmentation performance by leveraging tiny additional feedback from a human annotator. By using our methods practitioners can greatly reduce the amount of data annotation effort, which is required to learn modern image segmentation models. In the second half of the thesis we focus on methods for learning from unlabeled visual data. We study a family of autoregressive models for modeling structure of natural images and discuss potential applications of these models. Moreover, we conduct in-depth study of one of these applications, where we develop the state-of-the-art model for the probabilistic image colorization task

    Sparse modelling of natural images and compressive sensing

    Get PDF
    This thesis concerns the study of the statistics of natural images and compressive sensing for two main objectives: 1) to extend our understanding of the regularities exhibited by natural images of the visual world we regularly view around us, and 2) to incorporate this knowledge into image processing applications. Previous work on image statistics has uncovered remarkable behavior of the dis tributions obtained from filtering natural images. Typically we observe high kurtosis, non-Gaussian distributions with sharp central cusps, which are called sparse in the literature. These results have become an accepted fact through empirical findings us ing zero mean filters on many different databases of natural scenes. The observations have played an important role in computational and biological applications, where re searchers have sought to understand visual processes through studying the statistical properties of the objects that are being observed. Interestingly, such results on sparse distributions also share elements with the emerging field of compressive sensing. This is a novel sampling protocol where one seeks to measure a signal in already com pressed format through randomised projections, while the recovery algorithm consists of searching for a constrained solution with the sparsest transformed coefficients. In view of prior art, we extend our knowledge of image statistics from the monochrome domain into the colour domain. We study sparse response distributions of filters constructed on colour channels and observe the regularity of the distributions across diverse datasets of natural images. Several solutions to image processing problems emerge from the incorporation of colour statistics as prior information. We give a Bayesian treatment to the problem of colorizing natural gray images, and formulate image compression schemes using elements of compressive sensing and sparsity. We also propose a denoising algorithm that utilises the sparse filter responses as a regular- isation function for the effective attenuation of Gaussian and impulse noise in images. The results emanating from this body of work illustrate how the statistics of natural images, when incorporated with Bayesian inference and sparse recovery, can have deep implications for image processing applications

    Densely-sampled light field reconstruction

    Get PDF
    In this chapter, we motivate the use of densely-sampled light fields as the representation which can bring the required density of light rays for the correct recreation of 3D visual cues such as focus and continuous parallax and can serve as an intermediary between light field sensing and light field display. We consider the problem of reconstructing such a representation from few camera views and approach it in a sparsification framework. More specifically, we demonstrate that the light field is well structured in the set of so-called epipolar images and can be sparsely represented by a dictionary of directional and multi-scale atoms called shearlets. We present the corresponding regularization method, along with its main algorithm and speed-accelerating modifications. Finally, we illustrate its applicability for the cases of holographic stereograms and light field compression.acceptedVersionPeer reviewe
    corecore