1,217 research outputs found
Exploiting the potential of unlabeled endoscopic video data with self-supervised learning
Surgical data science is a new research field that aims to observe all
aspects of the patient treatment process in order to provide the right
assistance at the right time. Due to the breakthrough successes of deep
learning-based solutions for automatic image annotation, the availability of
reference annotations for algorithm training is becoming a major bottleneck in
the field. The purpose of this paper was to investigate the concept of
self-supervised learning to address this issue.
Our approach is guided by the hypothesis that unlabeled video data can be
used to learn a representation of the target domain that boosts the performance
of state-of-the-art machine learning algorithms when used for pre-training.
Core of the method is an auxiliary task based on raw endoscopic video data of
the target domain that is used to initialize the convolutional neural network
(CNN) for the target task. In this paper, we propose the re-colorization of
medical images with a generative adversarial network (GAN)-based architecture
as auxiliary task. A variant of the method involves a second pre-training step
based on labeled data for the target task from a related domain. We validate
both variants using medical instrument segmentation as target task.
The proposed approach can be used to radically reduce the manual annotation
effort involved in training CNNs. Compared to the baseline approach of
generating annotated data from scratch, our method decreases exploratively the
number of labeled images by up to 75% without sacrificing performance. Our
method also outperforms alternative methods for CNN pre-training, such as
pre-training on publicly available non-medical or medical data using the target
task (in this instance: segmentation).
As it makes efficient use of available (non-)public and (un-)labeled data,
the approach has the potential to become a valuable tool for CNN
(pre-)training
Deep Exemplar-based Colorization
We propose the first deep learning approach for exemplar-based local
colorization. Given a reference color image, our convolutional neural network
directly maps a grayscale image to an output colorized image. Rather than using
hand-crafted rules as in traditional exemplar-based methods, our end-to-end
colorization network learns how to select, propagate, and predict colors from
the large-scale data. The approach performs robustly and generalizes well even
when using reference images that are unrelated to the input grayscale image.
More importantly, as opposed to other learning-based colorization methods, our
network allows the user to achieve customizable results by simply feeding
different references. In order to further reduce manual effort in selecting the
references, the system automatically recommends references with our proposed
image retrieval algorithm, which considers both semantic and luminance
information. The colorization can be performed fully automatically by simply
picking the top reference suggestion. Our approach is validated through a user
study and favorable quantitative comparisons to the-state-of-the-art methods.
Furthermore, our approach can be naturally extended to video colorization. Our
code and models will be freely available for public use.Comment: To Appear in Siggraph 201
Automatic Video Colorization using 3D Conditional Generative Adversarial Networks
In this work, we present a method for automatic colorization of grayscale
videos. The core of the method is a Generative Adversarial Network that is
trained and tested on sequences of frames in a sliding window manner. Network
convolutional and deconvolutional layers are three-dimensional, with frame
height, width and time as the dimensions taken into account. Multiple
chrominance estimates per frame are aggregated and combined with available
luminance information to recreate a colored sequence. Colorization trials are
run succesfully on a dataset of old black-and-white films. The usefulness of
our method is also validated with numerical results, computed with a newly
proposed metric that measures colorization consistency over a frame sequence.Comment: 5 pages, 4 figure
Automatic Temporally Coherent Video Colorization
Greyscale image colorization for applications in image restoration has seen
significant improvements in recent years. Many of these techniques that use
learning-based methods struggle to effectively colorize sparse inputs. With the
consistent growth of the anime industry, the ability to colorize sparse input
such as line art can reduce significant cost and redundant work for production
studios by eliminating the in-between frame colorization process. Simply using
existing methods yields inconsistent colors between related frames resulting in
a flicker effect in the final video. In order to successfully automate key
areas of large-scale anime production, the colorization of line arts must be
temporally consistent between frames. This paper proposes a method to colorize
line art frames in an adversarial setting, to create temporally coherent video
of large anime by improving existing image to image translation methods. We
show that by adding an extra condition to the generator and discriminator, we
can effectively create temporally consistent video sequences from anime line
arts. Code and models available at: https://github.com/Harry-Thasarathan/TCV
Interpreting Models by Allowing to Ask
Questions convey information about the questioner, namely what one does not
know. In this paper, we propose a novel approach to allow a learning agent to
ask what it considers as tricky to predict, in the course of producing a final
output. By analyzing when and what it asks, we can make our model more
transparent and interpretable. We first develop this idea to propose a general
framework of deep neural networks that can ask questions, which we call asking
networks. A specific architecture and training process for an asking network is
proposed for the task of colorization, which is an exemplar one-to-many task
and thus a task where asking questions is helpful in performing the task
accurately. Our results show that the model learns to generate meaningful
questions, asks difficult questions first, and utilizes the provided hint more
efficiently than baseline models. We conclude that the proposed asking
framework makes the learning agent reveal its weaknesses, which poses a
promising new direction in developing interpretable and interactive models.Comment: 10 page
Image Colorization with Generative Adversarial Networks
Over the last decade, the process of automatic image colorization has been of
significant interest for several application areas including restoration of
aged or degraded images. This problem is highly ill-posed due to the large
degrees of freedom during the assignment of color information. Many of the
recent developments in automatic colorization involve images that contain a
common theme or require highly processed data such as semantic maps as input.
In our approach, we attempt to fully generalize the colorization procedure
using a conditional Deep Convolutional Generative Adversarial Network (DCGAN),
extend current methods to high-resolution images and suggest training
strategies that speed up the process and greatly stabilize it. The network is
trained over datasets that are publicly available such as CIFAR-10 and
Places365. The results of the generative model and traditional deep neural
networks are compared.Comment: Lecture Notes in Computer Science, Proceedings of tenth international
conference on Articulated Motion and Deformable Objects (AMDO), Palma,
Mallorca, Spain, 12-13 July 201
Real-Time User-Guided Image Colorization with Learned Deep Priors
We propose a deep learning approach for user-guided image colorization. The
system directly maps a grayscale image, along with sparse, local user "hints"
to an output colorization with a Convolutional Neural Network (CNN). Rather
than using hand-defined rules, the network propagates user edits by fusing
low-level cues along with high-level semantic information, learned from
large-scale data. We train on a million images, with simulated user inputs. To
guide the user towards efficient input selection, the system recommends likely
colors based on the input image and current user inputs. The colorization is
performed in a single feed-forward pass, enabling real-time use. Even with
randomly simulated user inputs, we show that the proposed system helps novice
users quickly create realistic colorizations, and offers large improvements in
colorization quality with just a minute of use. In addition, we demonstrate
that the framework can incorporate other user "hints" to the desired
colorization, showing an application to color histogram transfer. Our code and
models are available at https://richzhang.github.io/ideepcolor.Comment: Accepted to SIGGRAPH 2017. Project page:
https://richzhang.github.io/ideepcolo
Point Cloud Colorization Based on Densely Annotated 3D Shape Dataset
This paper introduces DensePoint, a densely sampled and annotated point cloud
dataset containing over 10,000 single objects across 16 categories, by merging
different kind of information from two existing datasets. Each point cloud in
DensePoint contains 40,000 points, and each point is associated with two sorts
of information: RGB value and part annotation. In addition, we propose a method
for point cloud colorization by utilizing Generative Adversarial Networks
(GANs). The network makes it possible to generate colours for point clouds of
single objects by only giving the point cloud itself. Experiments on DensePoint
show that there exist clear boundaries in point clouds between different parts
of an object, suggesting that the proposed network is able to generate
reasonably good colours. Our dataset is publicly available on the project page
InverseNet: Solving Inverse Problems with Splitting Networks
We propose a new method that uses deep learning techniques to solve the
inverse problems. The inverse problem is cast in the form of learning an
end-to-end mapping from observed data to the ground-truth. Inspired by the
splitting strategy widely used in regularized iterative algorithm to tackle
inverse problems, the mapping is decomposed into two networks, with one
handling the inversion of the physical forward model associated with the data
term and one handling the denoising of the output from the former network,
i.e., the inverted version, associated with the prior/regularization term. The
two networks are trained jointly to learn the end-to-end mapping, getting rid
of a two-step training. The training is annealing as the intermediate variable
between these two networks bridges the gap between the input (the degraded
version of output) and output and progressively approaches to the ground-truth.
The proposed network, referred to as InverseNet, is flexible in the sense that
most of the existing end-to-end network structure can be leveraged in the first
network and most of the existing denoising network structure can be used in the
second one. Extensive experiments on both synthetic data and real datasets on
the tasks, motion deblurring, super-resolution, and colorization, demonstrate
the efficiency and accuracy of the proposed method compared with other image
processing algorithms
DeepHist: Differentiable Joint and Color Histogram Layers for Image-to-Image Translation
We present the DeepHist - a novel Deep Learning framework for augmenting a
network by histogram layers and demonstrate its strength by addressing
image-to-image translation problems. Specifically, given an input image and a
reference color distribution we aim to generate an output image with the
structural appearance (content) of the input (source) yet with the colors of
the reference. The key idea is a new technique for a differentiable
construction of joint and color histograms of the output images. We further
define a color distribution loss based on the Earth Mover's Distance between
the output's and the reference's color histograms and a Mutual Information loss
based on the joint histograms of the source and the output images. Promising
results are shown for the tasks of color transfer, image colorization and edges
photo, where the color distribution of the output image is
controlled. Comparison to Pix2Pix and CyclyGANs are shown.Comment: arXiv admin note: text overlap with arXiv:1912.0604
- …