31,579 research outputs found
Deep SR-ITM: Joint Learning of Super-Resolution and Inverse Tone-Mapping for 4K UHD HDR Applications
Recent modern displays are now able to render high dynamic range (HDR), high
resolution (HR) videos of up to 8K UHD (Ultra High Definition). Consequently,
UHD HDR broadcasting and streaming have emerged as high quality premium
services. However, due to the lack of original UHD HDR video content,
appropriate conversion technologies are urgently needed to transform the legacy
low resolution (LR) standard dynamic range (SDR) videos into UHD HDR versions.
In this paper, we propose a joint super-resolution (SR) and inverse
tone-mapping (ITM) framework, called Deep SR-ITM, which learns the direct
mapping from LR SDR video to their HR HDR version. Joint SR and ITM is an
intricate task, where high frequency details must be restored for SR, jointly
with the local contrast, for ITM. Our network is able to restore fine details
by decomposing the input image and focusing on the separate base (low
frequency) and detail (high frequency) layers. Moreover, the proposed
modulation blocks apply location-variant operations to enhance local contrast.
The Deep SR-ITM shows good subjective quality with increased contrast and
details, outperforming the previous joint SR-ITM method.Comment: Accepted at ICCV 2019 (Oral
Switchable Temporal Propagation Network
Videos contain highly redundant information between frames. Such redundancy
has been extensively studied in video compression and encoding, but is less
explored for more advanced video processing. In this paper, we propose a
learnable unified framework for propagating a variety of visual properties of
video images, including but not limited to color, high dynamic range (HDR), and
segmentation information, where the properties are available for only a few
key-frames. Our approach is based on a temporal propagation network (TPN),
which models the transition-related affinity between a pair of frames in a
purely data-driven manner. We theoretically prove two essential factors for
TPN: (a) by regularizing the global transformation matrix as orthogonal, the
"style energy" of the property can be well preserved during propagation; (b)
such regularization can be achieved by the proposed switchable TPN with
bi-directional training on pairs of frames. We apply the switchable TPN to
three tasks: colorizing a gray-scale video based on a few color key-frames,
generating an HDR video from a low dynamic range (LDR) video and a few HDR
frames, and propagating a segmentation mask from the first frame in videos.
Experimental results show that our approach is significantly more accurate and
efficient than the state-of-the-art methods
Fully-automatic inverse tone mapping algorithm based on dynamic mid-level tone mapping
High Dynamic Range (HDR) displays can show images with higher color contrast levels and peak luminosities than the common Low Dynamic Range (LDR) displays. However, most existing video content is recorded and/or graded in LDR format. To show LDR content on HDR displays, it needs to be up-scaled using a so-called inverse tone mapping algorithm. Several techniques for inverse tone mapping have been proposed in the last years, going from simple approaches based on global and local operators to more advanced algorithms such as neural networks. Some of the drawbacks of existing techniques for inverse tone mapping are the need for human intervention, the high computation time for more advanced algorithms, limited low peak brightness, and the lack of the preservation of the artistic intentions. In this paper, we propose a fully-automatic inverse tone mapping operator based on mid-level mapping capable of real-time video processing. Our proposed algorithm allows expanding LDR images into HDR images with peak brightness over 1000 nits, preserving the artistic intentions inherent to the HDR domain. We assessed our results using the full-reference objective quality metrics HDR-VDP-2.2 and DRIM, and carrying out a subjective pair-wise comparison experiment. We compared our results with those obtained with the most recent methods found in the literature. Experimental results demonstrate that our proposed method outperforms the current state-of-the-art of simple inverse tone mapping methods and its performance is similar to other more complex and time-consuming advanced techniques
Applications of Fog Computing in Video Streaming
The purpose of this paper is to show the viability of fog computing in the area of video streaming in vehicles. With the rise of autonomous vehicles, there needs to be a viable entertainment option for users. The cloud fails to address these options due to latency problems experienced during high internet traffic. To improve video streaming speeds, fog computing seems to be the best option. Fog computing brings the cloud closer to the user through the use of intermediary devices known as fog nodes. It does not attempt to replace the cloud but improve the cloud by allowing faster upload and download of information. This paper explores two algorithms that would work well with vehicles and video streaming. This is simulated using a Java application, and then graphically represented. The results showed that the simulation was an accurate model and that the best algorithm for request history maintenance was the variable model
Temporal Image Fusion
This paper introduces temporal image fusion. The proposed technique builds
upon previous research in exposure fusion and expands it to deal with the
limited Temporal Dynamic Range of existing sensors and camera technologies. In
particular, temporal image fusion enables the rendering of long-exposure
effects on full frame-rate video, as well as the generation of arbitrarily long
exposures from a sequence of images of the same scene taken over time. We
explore the problem of temporal under-exposure, and show how it can be
addressed by selectively enhancing dynamic structure. Finally, we show that the
use of temporal image fusion together with content-selective image filters can
produce a range of striking visual effects on a given input sequence
An Online Learning Approach to Model Predictive Control
Model predictive control (MPC) is a powerful technique for solving dynamic
control tasks. In this paper, we show that there exists a close connection
between MPC and online learning, an abstract theoretical framework for
analyzing online decision making in the optimization literature. This new
perspective provides a foundation for leveraging powerful online learning
algorithms to design MPC algorithms. Specifically, we propose a new algorithm
based on dynamic mirror descent (DMD), an online learning algorithm that is
designed for non-stationary setups. Our algorithm, Dynamic Mirror Descent Model
Predictive Control (DMD-MPC), represents a general family of MPC algorithms
that includes many existing techniques as special instances. DMD-MPC also
provides a fresh perspective on previous heuristics used in MPC and suggests a
principled way to design new MPC algorithms. In the experimental section of
this paper, we demonstrate the flexibility of DMD-MPC, presenting a set of new
MPC algorithms on a simple simulated cartpole and a simulated and real-world
aggressive driving task. Videos of the real-world experiments can be found at
https://youtu.be/vZST3v0_S9w and https://youtu.be/MhuqiHo2t98.Comment: First two authors contributed equall
Generation of High Dynamic Range Illumination from a Single Image for the Enhancement of Undesirably Illuminated Images
This paper presents an algorithm that enhances undesirably illuminated images
by generating and fusing multi-level illuminations from a single image.The
input image is first decomposed into illumination and reflectance components by
using an edge-preserving smoothing filter. Then the reflectance component is
scaled up to improve the image details in bright areas. The illumination
component is scaled up and down to generate several illumination images that
correspond to certain camera exposure values different from the original. The
virtual multi-exposure illuminations are blended into an enhanced illumination,
where we also propose a method to generate appropriate weight maps for the tone
fusion. Finally, an enhanced image is obtained by multiplying the equalized
illumination and enhanced reflectance. Experiments show that the proposed
algorithm produces visually pleasing output and also yields comparable
objective results to the conventional enhancement methods, while requiring
modest computational loads
Kernelized Low Rank Representation on Grassmann Manifolds
Low rank representation (LRR) has recently attracted great interest due to
its pleasing efficacy in exploring low-dimensional subspace structures embedded
in data. One of its successful applications is subspace clustering which means
data are clustered according to the subspaces they belong to. In this paper, at
a higher level, we intend to cluster subspaces into classes of subspaces. This
is naturally described as a clustering problem on Grassmann manifold. The
novelty of this paper is to generalize LRR on Euclidean space onto an LRR model
on Grassmann manifold in a uniform kernelized framework. The new methods have
many applications in computer vision tasks. Several clustering experiments are
conducted on handwritten digit images, dynamic textures, human face clips and
traffic scene sequences. The experimental results show that the proposed
methods outperform a number of state-of-the-art subspace clustering methods.Comment: 13 page
Time Series Classification using the Hidden-Unit Logistic Model
We present a new model for time series classification, called the hidden-unit
logistic model, that uses binary stochastic hidden units to model latent
structure in the data. The hidden units are connected in a chain structure that
models temporal dependencies in the data. Compared to the prior models for time
series classification such as the hidden conditional random field, our model
can model very complex decision boundaries because the number of latent states
grows exponentially with the number of hidden units. We demonstrate the strong
performance of our model in experiments on a variety of (computer vision)
tasks, including handwritten character recognition, speech recognition, facial
expression, and action recognition. We also present a state-of-the-art system
for facial action unit detection based on the hidden-unit logistic model.Comment: 17 pages, 4 figures, 3 table
Delay-aware Fountain Codes for Video Streaming with Optimal Sampling Strategy
The explosive demand of on-line video from smart mobile devices poses
unprecedented challenges to delivering high quality of experience (QoE) over
wireless networks. Streaming high-definition video with low delay is difficult
mainly due to (i) the stochastic nature of wireless channels and (ii) the
fluctuating videos bit rate. To address this, we propose a novel delay-aware
fountain coding (DAF) technique that integrates channel coding and video
coding. In this paper, we reveal that the fluctuation of video bit rate can
also be exploited to further improve fountain codes for wireless video
streaming. Specifically, we develop two coding techniques: the time-based
sliding window and the optimal window-wise sampling strategy. By adaptively
selecting the window length and optimally adjusting the sampling pattern
according to the ongoing video bit rate, the proposed schemes deliver
significantly higher video quality than existing schemes, with low delay and
constant data rate. To validate our design, we implement the protocols of DAF,
DAF-L (a low-complexity version) and the existing delay-aware video streaming
schemes by streaming H.264/AVC standard videos over an 802.11b network on CORE
emulation platform. The results show that the decoding ratio of our scheme is
15% to 100% higher than the state of the art techniques.Comment: 12 pages, 15 figure
- …