    Метод автоматического расцвечивания рисованной мультипликации

    В статье рассматривается задача расцвечивания черно-белой рисованной мультипликации с использованием нейронных сетей. Исследуетсяэффективность модификаций существующего алгоритма-прототипа при различных комбинациях функций потерь, рассматриваются различные модификации алгоритма-прототипа. Предлагается новая функция потерь для нейронной сети — с использованием сегментирования изображения по цветам. Исследуется эффективность модифицированного алгоритма с предложенной функцией потерь

    UniColor: A Unified Framework for Multi-Modal Colorization with Transformer

    We propose the first unified framework UniColor to support colorization in multiple modalities, including both unconditional and conditional ones, such as stroke, exemplar, text, and even a mix of them. Rather than learning a separate model for each type of condition, we introduce a two-stage colorization framework for incorporating various conditions into a single model. In the first stage, multi-modal conditions are converted into a common representation of hint points. Particularly, we propose a novel CLIP-based method to convert the text to hint points. In the second stage, we propose a Transformer-based network composed of Chroma-VQGAN and Hybrid-Transformer to generate diverse and high-quality colorization results conditioned on hint points. Both qualitative and quantitative comparisons demonstrate that our method outperforms state-of-the-art methods in every control modality and further enables multi-modal colorization that was not feasible before. Moreover, we design an interactive interface showing the effectiveness of our unified framework in practical usage, including automatic colorization, hybrid-control colorization, local recolorization, and iterative color editing. Our code and models are available at https://luckyhzt.github.io/unicolor.Comment: Accepted by SIGGRAPH Asia 2022. Project page: https://luckyhzt.github.io/unicolo

    Injection de style par blanchissage et coloration dans un réseau génératif profond

    Dans la génération et la manipulation d'images basées sur les GANs, l'injection de style par Adaptive Instance Normalization (AdaIN) est devenue la norme pour paramétrer la génération avec une représentation latente du domaine des images. AdaIN fonctionne en modulant les statistiques des caractéristiques de l'image : il normalise d'abord les caractéristiques en soustrayant leur moyenne et en divisant par leur écart type puis injecte un vecteur de style par l'inverse de cette opération. Bien que cette méthode ait été utilisée avec succès dans une variété de scénarios de traduction d'image à image, la représentation statistique d'AdaIN est limitée en ce qu'elle ne tient pas compte des corrélations entre les caractéristiques. Cependant, dans la littérature du transfert de style, la transformation par blanchiment et coloration (Whitening & Coloring Transformation WCT) est devenue l'approche privilégiée, car elle prend compte de l'existence de ces corrélations. Toutefois, malgré ses bonnes performances en matière de transfert de style, l'utilisation du WCT n'a jusqu'à présent pas été explorée de manière approfondie dans le contexte de l'injection de style. Dans ce travail, nous comblons cette lacune en remplaçant AdaIN par une opération de WCT explicite pour l'injection de style dans les GAN. Plus précisément, nous introduisons un module qui peut être utilisé en remplacement des blocs AdaIN (sans changement additionnel) dans les architectures GAN populaires existantes et présentons son impact sur les tâches de génération. Effectivement, dans la génération d'images conditionnelles, où l'espace latent est destiné à représenter le style des images, nous constatons que le blanchiment aide à s'assurer que l'espace n'encode que des informations stylistiques, ce qui permet au contenu de l'image conditionnelle d'être plus visible. Nous démontrons les performances de notre méthode dans deux scénarios : 1) dans un context d'entraînement supervisé à l'aide du jeu de données Google Maps et 2) en ayant recours à l'architecture StarGANv2 multi-domaine et multi-modale dans une situation d'entraînement non-supervisé et ce en utilisant le jeu de données Animal Faces-HQ (AFHQ).In the GAN-based images generation and manipulation domain, style injection by Adaptive Instance Normalization (AdaIN) has become the standard method to allow the generation with a latent representation of the image domain. AdaIN works by modulating the statistics of the characteristics of the image: it first normalizes the characteristics by subtracting their mean and dividing by their standard deviation then it injects a style vector by the reverse of this operation. Although this method has been used successfully in a variety of image-to-image translation scenarios, the statistical representation of AdaIN is limited in that it does not take into account the existing correlations between the features. However, in the style transfer literature, the transformation by whitening and coloring (Whitening & Coloring Transformation WCT) has become the preferred approach because it takes into account the existence of these correlations. Yet, despite its good performance in terms of style transfer, the use of WCT has so far not been explored in depth in the style injection literature. In this work, we fill this gap by replacing AdaIN by an explicit operation of WCT for style injection in GAN. More specifically, we introduce a module that can be used as a replacement for the AdaIN blocks (without any additional change) in the existing popular GAN architectures and we present its impact on generation tasks. Indeed, in the conditional image generation task, where the latent space is intended to represent the style of the images, we find that whitening helps ensure that the space encodes only stylistic information which allows the content of the input image to be more visible. We demonstrate the performance of our method in two scenarios: 1) in a supervised training context using the Google Maps dataset and 2) using multi-domain and multi-modal StarGANv2 architecture in an unsupervised training setup using the Animal Faces-HQ (AFHQ) dataset

    Colour technologies for content production and distribution of broadcast content

    The requirement of colour reproduction has long been a priority driving the development of new colour imaging systems that maximise human perceptual plausibility. This thesis explores machine learning algorithms for colour processing to assist both content production and distribution. First, this research studies colourisation technologies with practical use cases in restoration and processing of archived content. The research targets practical deployable solutions, developing a cost-effective pipeline which integrates the activity of the producer into the processing workflow. In particular, a fully automatic image colourisation paradigm using Conditional GANs is proposed to improve content generalisation and colourfulness of existing baselines. Moreover, a more conservative solution is considered by providing references to guide the system towards more accurate colour predictions. A fast-end-to-end architecture is proposed to improve existing exemplar-based image colourisation methods while decreasing the complexity and runtime. Finally, the proposed image-based methods are integrated into a video colourisation pipeline. A general framework is proposed to reduce the generation of temporal flickering or propagation of errors when such methods are applied frame-to-frame. The proposed model is jointly trained to stabilise the input video and to cluster their frames with the aim of learning scene-specific modes. Second, this research explored colour processing technologies for content distribution with the aim to effectively deliver the processed content to the broad audience. In particular, video compression is tackled by introducing a novel methodology for chroma intra prediction based on attention models. Although the proposed architecture helped to gain control over the reference samples and better understand the prediction process, the complexity of the underlying neural network significantly increased the encoding and decoding time. Therefore, aiming at efficient deployment within the latest video coding standards, this work also focused on the simplification of the proposed architecture to obtain a more compact and explainable model