2,051 research outputs found
Learning Binary Residual Representations for Domain-specific Video Streaming
We study domain-specific video streaming. Specifically, we target a streaming
setting where the videos to be streamed from a server to a client are all in
the same domain and they have to be compressed to a small size for low-latency
transmission. Several popular video streaming services, such as the video game
streaming services of GeForce Now and Twitch, fall in this category. While
conventional video compression standards such as H.264 are commonly used for
this task, we hypothesize that one can leverage the property that the videos
are all in the same domain to achieve better video quality. Based on this
hypothesis, we propose a novel video compression pipeline. Specifically, we
first apply H.264 to compress domain-specific videos. We then train a novel
binary autoencoder to encode the leftover domain-specific residual information
frame-by-frame into binary representations. These binary representations are
then compressed and sent to the client together with the H.264 stream. In our
experiments, we show that our pipeline yields consistent gains over standard
H.264 compression across several benchmark datasets while using the same
channel bandwidth.Comment: Accepted in AAAI'18. Project website at
https://research.nvidia.com/publication/2018-02_Learning-Binary-Residua
SegFlow: Joint Learning for Video Object Segmentation and Optical Flow
This paper proposes an end-to-end trainable network, SegFlow, for
simultaneously predicting pixel-wise object segmentation and optical flow in
videos. The proposed SegFlow has two branches where useful information of
object segmentation and optical flow is propagated bidirectionally in a unified
framework. The segmentation branch is based on a fully convolutional network,
which has been proved effective in image segmentation task, and the optical
flow branch takes advantage of the FlowNet model. The unified framework is
trained iteratively offline to learn a generic notion, and fine-tuned online
for specific objects. Extensive experiments on both the video object
segmentation and optical flow datasets demonstrate that introducing optical
flow improves the performance of segmentation and vice versa, against the
state-of-the-art algorithms.Comment: Accepted in ICCV'17. Code is available at
https://sites.google.com/site/yihsuantsai/research/iccv17-segflo
Generative Face Completion
In this paper, we propose an effective face completion algorithm using a deep
generative model. Different from well-studied background completion, the face
completion task is more challenging as it often requires to generate
semantically new pixels for the missing key components (e.g., eyes and mouths)
that contain large appearance variations. Unlike existing nonparametric
algorithms that search for patches to synthesize, our algorithm directly
generates contents for missing regions based on a neural network. The model is
trained with a combination of a reconstruction loss, two adversarial losses and
a semantic parsing loss, which ensures pixel faithfulness and local-global
contents consistency. With extensive experimental results, we demonstrate
qualitatively and quantitatively that our model is able to deal with a large
area of missing pixels in arbitrary shapes and generate realistic face
completion results.Comment: Accepted by CVPR 201
PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection
Contexts play an important role in the saliency detection task. However,
given a context region, not all contextual information is helpful for the final
task. In this paper, we propose a novel pixel-wise contextual attention
network, i.e., the PiCANet, to learn to selectively attend to informative
context locations for each pixel. Specifically, for each pixel, it can generate
an attention map in which each attention weight corresponds to the contextual
relevance at each context location. An attended contextual feature can then be
constructed by selectively aggregating the contextual information. We formulate
the proposed PiCANet in both global and local forms to attend to global and
local contexts, respectively. Both models are fully differentiable and can be
embedded into CNNs for joint training. We also incorporate the proposed models
with the U-Net architecture to detect salient objects. Extensive experiments
show that the proposed PiCANets can consistently improve saliency detection
performance. The global and local PiCANets facilitate learning global contrast
and homogeneousness, respectively. As a result, our saliency model can detect
salient objects more accurately and uniformly, thus performing favorably
against the state-of-the-art methods
- …