199 research outputs found
Watching Videos with Certain and Constant Quality: PID-based Quality Control Method
In video coding, compressed videos with certain and constant quality can
ensure quality of experience (QoE). To this end, we propose in this paper a
novel PID-based quality control (PQC) method for video coding. Specifically, a
formulation is modelled to control quality of video coding with two objectives:
minimizing control error and quality fluctuation. Then, we apply the Laplace
domain analysis to model the relationship between quantization parameter (QP)
and control error in this formulation. Given the relationship between QP and
control error, we propose a solution to the PQC formulation, such that videos
can be compressed at certain and constant quality. Finally, experimental
results show that our PQC method is effective in both control accuracy and
quality fluctuation
Summable Reparameterizations of Wasserstein Critics in the One-Dimensional Setting
Generative adversarial networks (GANs) are an exciting alternative to
algorithms for solving density estimation problems---using data to assess how
likely samples are to be drawn from the same distribution. Instead of
explicitly computing these probabilities, GANs learn a generator that can match
the given probabilistic source. This paper looks particularly at this matching
capability in the context of problems with one-dimensional outputs. We identify
a class of function decompositions with properties that make them well suited
to the critic role in a leading approach to GANs known as Wasserstein GANs. We
show that Taylor and Fourier series decompositions belong to our class, provide
examples of these critics outperforming standard GAN approaches, and suggest
how they can be scaled to higher dimensional problems in the future
Generalization Tower Network: A Novel Deep Neural Network Architecture for Multi-Task Learning
Deep learning (DL) advances state-of-the-art reinforcement learning (RL), by
incorporating deep neural networks in learning representations from the input
to RL. However, the conventional deep neural network architecture is limited in
learning representations for multi-task RL (MT-RL), as multiple tasks can refer
to different kinds of representations. In this paper, we thus propose a novel
deep neural network architecture, namely generalization tower network (GTN),
which can achieve MT-RL within a single learned model. Specifically, the
architecture of GTN is composed of both horizontal and vertical streams. In our
GTN architecture, horizontal streams are used to learn representation shared in
similar tasks. In contrast, the vertical streams are introduced to be more
suitable for handling diverse tasks, which encodes hierarchical shared
knowledge of these tasks. The effectiveness of the introduced vertical stream
is validated by experimental results. Experimental results further verify that
our GTN architecture is able to advance the state-of-the-art MT-RL, via being
tested on 51 Atari games
Learning Approximate Stochastic Transition Models
We examine the problem of learning mappings from state to state, suitable for
use in a model-based reinforcement-learning setting, that simultaneously
generalize to novel states and can capture stochastic transitions. We show that
currently popular generative adversarial networks struggle to learn these
stochastic transition models but a modification to their loss functions results
in a powerful learning algorithm for this class of problems
Numerical Modeling on Thermal Loading of Diamond Crystal in X-ray FEL Oscillator
Due to high reflectivity and high resolution of X-ray pulses, diamond is one
of the most popular Bragg crystals serving as the reflecting mirror and
monochromator in the next generation of free electrons lasers (FELs). The
energy deposition of X-rays will result in thermal heating, and thus lattice
expansion of the diamond crystal, which may degrade the performance of X-ray
FELs. In this paper, the thermal loading effect of diamond crystal for X-ray
FEL oscillators has been systematically studied by combined simulation with
Geant4 and ANSYS, and its dependence on the environmental temperature, crystal
size, X-ray pulse repetition rate and pulse energy are presented. Our results
show that taking the thermal loading effects into account, X-ray FEL
oscillators are still robust and promising with an optimized design.Comment: 6 pages, 9 figures, 1 tables, To be published in Chinese Physics
Does Haze Removal Help CNN-based Image Classification?
Hazy images are common in real scenarios and many dehazing methods have been
developed to automatically remove the haze from images. Typically, the goal of
image dehazing is to produce clearer images from which human vision can better
identify the object and structural details present in the images. When the
ground-truth haze-free image is available for a hazy image, quantitative
evaluation of image dehazing is usually based on objective metrics, such as
Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM). However, in
many applications, large-scale images are collected not for visual examination
by human. Instead, they are used for many high-level vision tasks, such as
automatic classification, recognition and categorization. One fundamental
problem here is whether various dehazing methods can produce clearer images
that can help improve the performance of the high-level tasks. In this paper,
we empirically study this problem in the important task of image classification
by using both synthetic and real hazy image datasets. From the experimental
results, we find that the existing image-dehazing methods cannot improve much
the image-classification performance and sometimes even reduce the
image-classification performance
Image Inpainting using Block-wise Procedural Training with Annealed Adversarial Counterpart
Recent advances in deep generative models have shown promising potential in
image inpanting, which refers to the task of predicting missing pixel values of
an incomplete image using the known context. However, existing methods can be
slow or generate unsatisfying results with easily detectable flaws. In
addition, there is often perceivable discontinuity near the holes and require
further post-processing to blend the results. We present a new approach to
address the difficulty of training a very deep generative model to synthesize
high-quality photo-realistic inpainting. Our model uses conditional generative
adversarial networks (conditional GANs) as the backbone, and we introduce a
novel block-wise procedural training scheme to stabilize the training while we
increase the network depth. We also propose a new strategy called adversarial
loss annealing to reduce the artifacts. We further describe several losses
specifically designed for inpainting and show their effectiveness. Extensive
experiments and user-study show that our approach outperforms existing methods
in several tasks such as inpainting, face completion and image harmonization.
Finally, we show our framework can be easily used as a tool for interactive
guided inpainting, demonstrating its practical value to solve common real-world
challenges
PortraitGAN for Flexible Portrait Manipulation
Previous methods have dealt with discrete manipulation of facial attributes
such as smile, sad, angry, surprise etc, out of canonical expressions and they
are not scalable, operating in single modality. In this paper, we propose a
novel framework that supports continuous edits and multi-modality portrait
manipulation using adversarial learning. Specifically, we adapt
cycle-consistency into the conditional setting by leveraging additional facial
landmarks information. This has two effects: first cycle mapping induces
bidirectional manipulation and identity preserving; second pairing samples from
different modalities can thus be utilized. To ensure high-quality synthesis, we
adopt texture-loss that enforces texture consistency and multi-level
adversarial supervision that facilitates gradient flow. Quantitative and
qualitative experiments show the effectiveness of our framework in performing
flexible and multi-modality portrait manipulation with photo-realistic effects
Predicting Head Movement in Panoramic Video: A Deep Reinforcement Learning Approach
Panoramic video provides immersive and interactive experience by enabling
humans to control the field of view (FoV) through head movement (HM). Thus, HM
plays a key role in modeling human attention on panoramic video. This paper
establishes a database collecting subjects' HM in panoramic video sequences.
From this database, we find that the HM data are highly consistent across
subjects. Furthermore, we find that deep reinforcement learning (DRL) can be
applied to predict HM positions, via maximizing the reward of imitating human
HM scanpaths through the agent's actions. Based on our findings, we propose a
DRL-based HM prediction (DHP) approach with offline and online versions, called
offline-DHP and online-DHP. In offline-DHP, multiple DRL workflows are run to
determine potential HM positions at each panoramic frame. Then, a heat map of
the potential HM positions, named the HM map, is generated as the output of
offline-DHP. In online-DHP, the next HM position of one subject is estimated
given the currently observed HM position, which is achieved by developing a DRL
algorithm upon the learned offline-DHP model. Finally, the experiments validate
that our approach is effective in both offline and online prediction of HM
positions for panoramic video, and that the learned offline-DHP model can
improve the performance of online-DHP.Comment: 15 pages, 10 figures, published on TPAMI 201
Design Identification of Curve Patterns on Cultural Heritage Objects: Combining Template Matching and CNN-based Re-Ranking
The surfaces of many cultural heritage objects were embellished with various
patterns, especially curve patterns. In practice, most of the unearthed
cultural heritage objects are highly fragmented, e.g., sherds of potteries or
vessels, and each of them only shows a very small portion of the underlying
full design, with noise and deformations. The goal of this paper is to address
the challenging problem of automatically identifying the underlying full design
of curve patterns from such a sherd. Specifically, we formulate this problem as
template matching: curve structure segmented from the sherd is matched to each
location with each possible orientation of each known full design. In this
paper, we propose a new two-stage matching algorithm, with a different matching
cost in each stage. In Stage 1, we use a traditional template matching, which
is highly computationally efficient, over the whole search space and identify a
small set of candidate matchings. In Stage 2, we derive a new matching cost by
training a dual-source Convolutional Neural Network (CNN) and apply it to
re-rank the candidate matchings identified in Stage 1. We collect 600 pottery
sherds with 98 full designs from the Woodland Period in Southeastern North
America for experiments and the performance of the proposed algorithm is very
competitive.Comment: 11 pages, 12 figure
- …