9,489 research outputs found
An Efficient Approach to Automatic Generation of Time-lapse Video Sequences
Time-lapse video sequences have recently become a highly utilised asset for marketing and advertising, particularly within the field of construction and landscape development. However, the manual generation of these videos, at a quality that can be used for marketing purposes, can be quite time-consuming. In this paper, a novel application for generating time-lapse videos is proposed, which will automatically select the optimal frames for time-lapse video generation, enhance these frames by applying a number of image pre- processing and machine learning techniques such as FAST super-resolution to improve the frames quality, and finally, provide an intuitive user interface to allow users to customise the time-lapse video with company branding. The auto-generated time-lapse videos will use techniques such as Laplacian filtering and temporal smoothing filtering to determine inactivity within the video sequence, classify day or night and, by use of optical character recognition, have the ability to remove unwanted artefacts such as the captured video date and time stamp. The obtained results from the proposed approach produce comparable video sequences to those produced manually, but with the advantage of being generated much faster and not requiring specialised video editing skills to complete
UG^2: a Video Benchmark for Assessing the Impact of Image Restoration and Enhancement on Automatic Visual Recognition
Advances in image restoration and enhancement techniques have led to
discussion about how such algorithmscan be applied as a pre-processing step to
improve automatic visual recognition. In principle, techniques like deblurring
and super-resolution should yield improvements by de-emphasizing noise and
increasing signal in an input image. But the historically divergent goals of
the computational photography and visual recognition communities have created a
significant need for more work in this direction. To facilitate new research,
we introduce a new benchmark dataset called UG^2, which contains three
difficult real-world scenarios: uncontrolled videos taken by UAVs and manned
gliders, as well as controlled videos taken on the ground. Over 160,000
annotated frames forhundreds of ImageNet classes are available, which are used
for baseline experiments that assess the impact of known and unknown image
artifacts and other conditions on common deep learning-based object
classification approaches. Further, current image restoration and enhancement
techniques are evaluated by determining whether or not theyimprove baseline
classification performance. Results showthat there is plenty of room for
algorithmic innovation, making this dataset a useful tool going forward.Comment: Supplemental material: https://goo.gl/vVM1xe, Dataset:
https://goo.gl/AjA6En, CVPR 2018 Prize Challenge: ug2challenge.or
Learning Matchable Image Transformations for Long-term Metric Visual Localization
Long-term metric self-localization is an essential capability of autonomous
mobile robots, but remains challenging for vision-based systems due to
appearance changes caused by lighting, weather, or seasonal variations. While
experience-based mapping has proven to be an effective technique for bridging
the `appearance gap,' the number of experiences required for reliable metric
localization over days or months can be very large, and methods for reducing
the necessary number of experiences are needed for this approach to scale.
Taking inspiration from color constancy theory, we learn a nonlinear
RGB-to-grayscale mapping that explicitly maximizes the number of inlier feature
matches for images captured under different lighting and weather conditions,
and use it as a pre-processing step in a conventional single-experience
localization pipeline to improve its robustness to appearance change. We train
this mapping by approximating the target non-differentiable localization
pipeline with a deep neural network, and find that incorporating a learned
low-dimensional context feature can further improve cross-appearance feature
matching. Using synthetic and real-world datasets, we demonstrate substantial
improvements in localization performance across day-night cycles, enabling
continuous metric localization over a 30-hour period using a single mapping
experience, and allowing experience-based localization to scale to long
deployments with dramatically reduced data requirements.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the
IEEE International Conference on Robotics and Automation (ICRA'20), Paris,
France, May 31-June 4, 202
Sparsity Invariant CNNs
In this paper, we consider convolutional neural networks operating on sparse
inputs with an application to depth upsampling from sparse laser scan data.
First, we show that traditional convolutional networks perform poorly when
applied to sparse data even when the location of missing data is provided to
the network. To overcome this problem, we propose a simple yet effective sparse
convolution layer which explicitly considers the location of missing data
during the convolution operation. We demonstrate the benefits of the proposed
network architecture in synthetic and real experiments with respect to various
baseline approaches. Compared to dense baselines, the proposed sparse
convolution network generalizes well to novel datasets and is invariant to the
level of sparsity in the data. For our evaluation, we derive a novel dataset
from the KITTI benchmark, comprising 93k depth annotated RGB images. Our
dataset allows for training and evaluating depth upsampling and depth
prediction techniques in challenging real-world settings and will be made
available upon publication
DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs
We present a novel deep learning architecture for fusing static
multi-exposure images. Current multi-exposure fusion (MEF) approaches use
hand-crafted features to fuse input sequence. However, the weak hand-crafted
representations are not robust to varying input conditions. Moreover, they
perform poorly for extreme exposure image pairs. Thus, it is highly desirable
to have a method that is robust to varying input conditions and capable of
handling extreme exposure without artifacts. Deep representations have known to
be robust to input conditions and have shown phenomenal performance in a
supervised setting. However, the stumbling block in using deep learning for MEF
was the lack of sufficient training data and an oracle to provide the
ground-truth for supervision. To address the above issues, we have gathered a
large dataset of multi-exposure image stacks for training and to circumvent the
need for ground truth images, we propose an unsupervised deep learning
framework for MEF utilizing a no-reference quality metric as loss function. The
proposed approach uses a novel CNN architecture trained to learn the fusion
operation without reference ground truth image. The model fuses a set of common
low level features extracted from each image to generate artifact-free
perceptually pleasing results. We perform extensive quantitative and
qualitative evaluation and show that the proposed technique outperforms
existing state-of-the-art approaches for a variety of natural images.Comment: ICCV 201
- …