37 research outputs found
Image Manipulation and Image Synthesis
Image manipulation is of historic importance. Ever since the advent of photography, pictures have been manipulated for various reasons. Historic rulers often used image manipulation techniques for the purpose of self-portrayal or propaganda. In many cases, the goal is to manipulate human behaviour by spreading credible misinformation. Photographs, by their nature, portray the real world and as such are more credible to humans. However, image manipulation may not only serve evil purposes. In this thesis, we propose and analyse methods for image manipulation that serve a positive purpose. Specifically, we treat image manipulation as a tool for solving other tasks. For this, we model image manipulation as an image-to-image translation (I2I) task, i.e., a system that receives an image as input and outputs a manipulated version of the input. We propose multiple I2I based methods. We demonstrate that I2I based image manipulation methods can be used to reduce motion blur in videos. Second, we show that I2I based image manipulation methods can be used for domain adaptation and domain extension. Specifically, we present a method that significantly improves the learning of semantic segmentation from synthetic source data. The same technique can be applied to learning nighttime semantic segmentation from daylight images. Next, we show that I2I can be used to enable weakly supervised object segmentation.
We show that each individual task requires and allows for different levels of supervision during the training of deep models in order to achieve best performance. We discuss the importance of maintaining control over the output of such methods and show that, with reduced levels of supervision, methods for maintaining stability during training and for establishing control over the output of a system become increasingly important. We propose multiple methods that solve the issues that arise in such systems. Finally, we demonstrate that our proposed mechanisms for control can be adapted to synthesise images from scratch
Learning Depth from Focus in the Wild
For better photography, most recent commercial cameras including smartphones
have either adopted large-aperture lens to collect more light or used a burst
mode to take multiple images within short times. These interesting features
lead us to examine depth from focus/defocus.
In this work, we present a convolutional neural network-based depth
estimation from single focal stacks. Our method differs from relevant
state-of-the-art works with three unique features. First, our method allows
depth maps to be inferred in an end-to-end manner even with image alignment.
Second, we propose a sharp region detection module to reduce blur ambiguities
in subtle focus changes and weakly texture-less regions. Third, we design an
effective downsampling module to ease flows of focal information in feature
extractions. In addition, for the generalization of the proposed network, we
develop a simulator to realistically reproduce the features of commercial
cameras, such as changes in field of view, focal length and principal points.
By effectively incorporating these three unique features, our network
achieves the top rank in the DDFF 12-Scene benchmark on most metrics. We also
demonstrate the effectiveness of the proposed method on various quantitative
evaluations and real-world images taken from various off-the-shelf cameras
compared with state-of-the-art methods. Our source code is publicly available
at https://github.com/wcy199705/DfFintheWild
Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers
Recent years have seen significant developments in the field of License Plate
Recognition (LPR) through the integration of deep learning techniques and the
increasing availability of training data. Nevertheless, reconstructing license
plates (LPs) from low-resolution (LR) surveillance footage remains challenging.
To address this issue, we introduce a Single-Image Super-Resolution (SISR)
approach that integrates attention and transformer modules to enhance the
detection of structural and textural features in LR images. Our approach
incorporates sub-pixel convolution layers (also known as PixelShuffle) and a
loss function that uses an Optical Character Recognition (OCR) model for
feature extraction. We trained the proposed architecture on synthetic images
created by applying heavy Gaussian noise to high-resolution LP images from two
public datasets, followed by bicubic downsampling. As a result, the generated
images have a Structural Similarity Index Measure (SSIM) of less than 0.10. Our
results show that our approach for reconstructing these low-resolution
synthesized images outperforms existing ones in both quantitative and
qualitative measures. Our code is publicly available at
https://github.com/valfride/lpr-rsr-ext
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity
Super-resolution towards license plate recognition
Orientador: David MenottiDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 24/04/2023Inclui referências: p. 51-59Área de concentração: Ciência da ComputaçãoResumo: Nos últimos anos, houve avanços significativos no campo de Reconhecimento de placas de veiculares (LPR, do inglês License Plate Recognition) por meio da integração de técnicas de aprendizado profundo e do aumento da disponibilidade de dados para treinamento. No entanto, reconstruir placas veiculares a partir de imagens de sistemas de vigilância em baixa resolução ainda é um desafio. Para enfrentar essa dificuldade, apresentamos uma abordagem de Super Resolução de Imagem Única (SISR, do inglês Single-Image Super-Resolution) que integra módulos de atenção para aprimorar a detecção de característica estruturais e texturais em imagens de baixa resolução. Nossa abordagem utiliza camadas de convolução sub-pixel (também conhecidas como PixelShuffle) e uma função de perda que emprega um modelo de Reconhecimento Óptico de Caracteres (OCR, do inglês Optical Character Recognition) para extração de características. Treinamos a arquitetura proposta com imagens sintéticas criadas aplicando ruído gaussiano pesado à imagens de alta resolução de placas veiculares de dois conjuntos de dados públicos, seguido de redução de sua resolução com interpolação bicúbica. Como resultado, as imagens geradas têm um Índice de Similaridade Estrutural (SSIM, do inglês Structural Similarity Index Measure) inferior a 0,10. Nossos resultados experimentais mostram que a abordagem proposta para reconstruir essas imagens sintéticas de baixa resolução superou as existentes tanto em medidas quantitativas quanto qualitativas.Abstract: Recent years have seen significant developments in the field of License Plate Recognition (LPR) through the integration of deep learning techniques and the increasing availability of training data. Nevertheless, reconstructing license plates (LPs) from low-resolution (LR) surveillance footage remains challenging. To address this issue, we introduce a Single-Image Super-Resolution (SISR) approach that integrates attention and transformer modules to enhance the detection of structural and textural features in LR images. Our approach incorporates sub-pixel convolution layers (also known as PixelShuffle) and a loss function that uses an Optical Character Recognition (OCR) model for feature extraction. We trained the proposed architecture on synthetic images created by applying heavy Gaussian noise to high-resolution LP images from two public datasets, followed by bicubic downsampling. As a result, the generated images have a Structural Similarity Index Measure (SSIM) of less than 0.10. Our results show that our approach for reconstructing these low-resolution synthesized images outperforms existing ones in both quantitative and qualitative measures