9,489 research outputs found

    An Efficient Approach to Automatic Generation of Time-lapse Video Sequences

    Get PDF
    Time-lapse video sequences have recently become a highly utilised asset for marketing and advertising, particularly within the field of construction and landscape development. However, the manual generation of these videos, at a quality that can be used for marketing purposes, can be quite time-consuming. In this paper, a novel application for generating time-lapse videos is proposed, which will automatically select the optimal frames for time-lapse video generation, enhance these frames by applying a number of image pre- processing and machine learning techniques such as FAST super-resolution to improve the frames quality, and finally, provide an intuitive user interface to allow users to customise the time-lapse video with company branding. The auto-generated time-lapse videos will use techniques such as Laplacian filtering and temporal smoothing filtering to determine inactivity within the video sequence, classify day or night and, by use of optical character recognition, have the ability to remove unwanted artefacts such as the captured video date and time stamp. The obtained results from the proposed approach produce comparable video sequences to those produced manually, but with the advantage of being generated much faster and not requiring specialised video editing skills to complete

    UG^2: a Video Benchmark for Assessing the Impact of Image Restoration and Enhancement on Automatic Visual Recognition

    Full text link
    Advances in image restoration and enhancement techniques have led to discussion about how such algorithmscan be applied as a pre-processing step to improve automatic visual recognition. In principle, techniques like deblurring and super-resolution should yield improvements by de-emphasizing noise and increasing signal in an input image. But the historically divergent goals of the computational photography and visual recognition communities have created a significant need for more work in this direction. To facilitate new research, we introduce a new benchmark dataset called UG^2, which contains three difficult real-world scenarios: uncontrolled videos taken by UAVs and manned gliders, as well as controlled videos taken on the ground. Over 160,000 annotated frames forhundreds of ImageNet classes are available, which are used for baseline experiments that assess the impact of known and unknown image artifacts and other conditions on common deep learning-based object classification approaches. Further, current image restoration and enhancement techniques are evaluated by determining whether or not theyimprove baseline classification performance. Results showthat there is plenty of room for algorithmic innovation, making this dataset a useful tool going forward.Comment: Supplemental material: https://goo.gl/vVM1xe, Dataset: https://goo.gl/AjA6En, CVPR 2018 Prize Challenge: ug2challenge.or

    Learning Matchable Image Transformations for Long-term Metric Visual Localization

    Full text link
    Long-term metric self-localization is an essential capability of autonomous mobile robots, but remains challenging for vision-based systems due to appearance changes caused by lighting, weather, or seasonal variations. While experience-based mapping has proven to be an effective technique for bridging the `appearance gap,' the number of experiences required for reliable metric localization over days or months can be very large, and methods for reducing the necessary number of experiences are needed for this approach to scale. Taking inspiration from color constancy theory, we learn a nonlinear RGB-to-grayscale mapping that explicitly maximizes the number of inlier feature matches for images captured under different lighting and weather conditions, and use it as a pre-processing step in a conventional single-experience localization pipeline to improve its robustness to appearance change. We train this mapping by approximating the target non-differentiable localization pipeline with a deep neural network, and find that incorporating a learned low-dimensional context feature can further improve cross-appearance feature matching. Using synthetic and real-world datasets, we demonstrate substantial improvements in localization performance across day-night cycles, enabling continuous metric localization over a 30-hour period using a single mapping experience, and allowing experience-based localization to scale to long deployments with dramatically reduced data requirements.Comment: In IEEE Robotics and Automation Letters (RA-L) and presented at the IEEE International Conference on Robotics and Automation (ICRA'20), Paris, France, May 31-June 4, 202

    Sparsity Invariant CNNs

    Full text link
    In this paper, we consider convolutional neural networks operating on sparse inputs with an application to depth upsampling from sparse laser scan data. First, we show that traditional convolutional networks perform poorly when applied to sparse data even when the location of missing data is provided to the network. To overcome this problem, we propose a simple yet effective sparse convolution layer which explicitly considers the location of missing data during the convolution operation. We demonstrate the benefits of the proposed network architecture in synthetic and real experiments with respect to various baseline approaches. Compared to dense baselines, the proposed sparse convolution network generalizes well to novel datasets and is invariant to the level of sparsity in the data. For our evaluation, we derive a novel dataset from the KITTI benchmark, comprising 93k depth annotated RGB images. Our dataset allows for training and evaluating depth upsampling and depth prediction techniques in challenging real-world settings and will be made available upon publication

    DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs

    Full text link
    We present a novel deep learning architecture for fusing static multi-exposure images. Current multi-exposure fusion (MEF) approaches use hand-crafted features to fuse input sequence. However, the weak hand-crafted representations are not robust to varying input conditions. Moreover, they perform poorly for extreme exposure image pairs. Thus, it is highly desirable to have a method that is robust to varying input conditions and capable of handling extreme exposure without artifacts. Deep representations have known to be robust to input conditions and have shown phenomenal performance in a supervised setting. However, the stumbling block in using deep learning for MEF was the lack of sufficient training data and an oracle to provide the ground-truth for supervision. To address the above issues, we have gathered a large dataset of multi-exposure image stacks for training and to circumvent the need for ground truth images, we propose an unsupervised deep learning framework for MEF utilizing a no-reference quality metric as loss function. The proposed approach uses a novel CNN architecture trained to learn the fusion operation without reference ground truth image. The model fuses a set of common low level features extracted from each image to generate artifact-free perceptually pleasing results. We perform extensive quantitative and qualitative evaluation and show that the proposed technique outperforms existing state-of-the-art approaches for a variety of natural images.Comment: ICCV 201
    corecore