Search CORE

507 research outputs found

Learn to Model Motion from Blurry Footages

Author: Chen Da
Cosker Darren
Li Wenbin
Lv Zhihan
Yan Yan
Publication venue
Publication date: 19/04/2017
Field of study

It is difficult to recover the motion field from a real-world footage given a mixture of camera shake and other photometric effects. In this paper we propose a hybrid framework by interleaving a Convolutional Neural Network (CNN) and a traditional optical flow energy. We first conduct a CNN architecture using a novel learnable directional filtering layer. Such layer encodes the angle and distance similarity matrix between blur and camera motion, which is able to enhance the blur features of the camera-shake footages. The proposed CNNs are then integrated into an iterative optical flow framework, which enable the capability of modelling and solving both the blind deconvolution and the optical flow estimation problems simultaneously. Our framework is trained end-to-end on a synthetic dataset and yields competitive precision and performance against the state-of-the-art approaches.Comment: Preprint of our paper accepted by Pattern Recognitio

arXiv.org e-Print Archive

Bridging the Gap Between Computational Photography and Visual Recognition

Author: Albright Michael
Banerjee Sreya
Chen Hwann-Tzong
Chin Wen-Chi
Davalos Pedro
Ghosh Sushobhan
Hu Yueyu
Huang Tzu-Wei
Lababidi Mahmoud
Li Yi-Chun
Liu Jiaying
McCloskey Scott
Miller Ben
Nagesh Sudarshan
Otto Charles
RichardWebster Brandon
Scheirer Walter J.
Tambo Asong
VidalMata Rosaura G.
Wang Zhangyang
Wu Junru
Yang Wenhan
Yuan Ye
Zhang Xiaoshuai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/02/2020
Field of study

What is the current state-of-the-art for image restoration and enhancement applied to degraded images acquired under less than ideal circumstances? Can the application of such algorithms as a pre-processing step to improve image interpretability for manual analysis or automatic visual recognition to classify scene content? While there have been important advances in the area of computational photography to restore or enhance the visual quality of an image, the capabilities of such techniques have not always translated in a useful way to visual recognition tasks. Consequently, there is a pressing need for the development of algorithms that are designed for the joint problem of improving visual appearance and recognition, which will be an enabling factor for the deployment of visual recognition tools in many real-world scenarios. To address this, we introduce the UG^2 dataset as a large-scale benchmark composed of video imagery captured under challenging conditions, and two enhancement tasks designed to test algorithmic impact on visual quality and automatic object recognition. Furthermore, we propose a set of metrics to evaluate the joint improvement of such tasks as well as individual algorithmic advances, including a novel psychophysics-based evaluation regime for human assessment and a realistic set of quantitative measures for object recognition performance. We introduce six new algorithms for image restoration or enhancement, which were created as part of the IARPA sponsored UG^2 Challenge workshop held at CVPR 2018. Under the proposed evaluation regime, we present an in-depth analysis of these algorithms and a host of deep learning-based and classic baseline approaches. From the observed results, it is evident that we are in the early days of building a bridge between computational photography and visual recognition, leaving many opportunities for innovation in this area.Comment: CVPR Prize Challenge: http://www.ug2challenge.or

arXiv.org e-Print Archive

Identifying Most Walkable Direction for Navigation in an Outdoor Environment

Author: Hajishirzi Hannaneh
Mehta Sachin
Shapiro Linda
Publication venue
Publication date: 30/11/2017
Field of study

We present an approach for identifying the most walkable direction for navigation using a hand-held camera. Our approach extracts semantically rich contextual information from the scene using a custom encoder-decoder architecture for semantic segmentation and models the spatial and temporal behavior of objects in the scene using a spatio-temporal graph. The system learns to minimize a cost function over the spatial and temporal object attributes to identify the most walkable direction. We construct a new annotated navigation dataset collected using a hand-held mobile camera in an unconstrained outdoor environment, which includes challenging settings such as highly dynamic scenes, occlusion between objects, and distortions. Our system achieves an accuracy of 84% on predicting a safe direction. We also show that our custom segmentation network is both fast and accurate, achieving mIOU (mean intersection over union) scores of 81 and 44.7 on the PASCAL VOC and the PASCAL Context datasets, respectively, while running at about 21 frames per second

arXiv.org e-Print Archive

Physics-Based Generative Adversarial Models for Image Restoration and Beyond

Author: Dong Jiangxin
Liu Yang
Pan Jinshan
Ren Jimmy
Tai Yu-Wing
Tang Jinhui
Yang Ming-Hsuan
Zhang Jiawei
Publication venue
Publication date: 29/03/2020
Field of study

We present an algorithm to directly solve numerous image restoration problems (e.g., image deblurring, image dehazing, image deraining, etc.). These problems are highly ill-posed, and the common assumptions for existing methods are usually based on heuristic image priors. In this paper, we find that these problems can be solved by generative models with adversarial learning. However, the basic formulation of generative adversarial networks (GANs) does not generate realistic images, and some structures of the estimated images are usually not preserved well. Motivated by an interesting observation that the estimated results should be consistent with the observed inputs under the physics models, we propose a physics model constrained learning algorithm so that it can guide the estimation of the specific task in the conventional GAN framework. The proposed algorithm is trained in an end-to-end fashion and can be applied to a variety of image restoration and related low-level vision problems. Extensive experiments demonstrate that our method performs favorably against the state-of-the-art algorithms.Comment: IEEE TPAM

arXiv.org e-Print Archive

Image and Depth from a Single Defocused Image Using Coded Aperture Photography

Author: Masoudifar Mina
Pourreza Hamid Reza
Publication venue
Publication date: 13/03/2016
Field of study

Depth from defocus and defocus deblurring from a single image are two challenging problems that are derived from the finite depth of field in conventional cameras. Coded aperture imaging is one of the techniques that is used for improving the results of these two problems. Up to now, different methods have been proposed for improving the results of either defocus deblurring or depth estimation. In this paper, a multi-objective function is proposed for evaluating and designing aperture patterns with the aim of improving the results of both depth from defocus and defocus deblurring. Pattern evaluation is performed by considering the scene illumination condition and camera system specification. Based on the proposed criteria, a single asymmetric pattern is designed that is used for restoring a sharp image and a depth map from a single input. Since the designed pattern is asymmetric, defocus objects on the two sides of the focal plane can be distinguished. Depth estimation is performed by using a new algorithm, which is based on image quality assessment criteria and can distinguish between blurred objects lying in front or behind the focal plane. Extensive simulations as well as experiments on a variety of real scenes are conducted to compare our aperture with previously proposed ones.Comment: 18 pages, 14 figures, submitte

arXiv.org e-Print Archive

Structural and object detection for phosphene images

Author: Guerrero Jose J.
Martinez-Cantin Ruben
Sanchez-Garcia Melani
Publication venue
Publication date: 26/09/2018
Field of study

Prosthetic vision based on phosphenes is a promising way to provide visual perception to some blind people. However, phosphenic images are very limited in terms of spatial resolution (e.g.: 32 x 32 phosphene array) and luminance levels (e.g.: 8 gray levels), which results in the subject receiving very limited information about the scene. This requires using high-level processing to extract more information from the scene and present it to the subject with the phosphenes limitations. In this work, we study the recognition of indoor environments under simulated prosthetic vision. Most research in simulated prosthetic vision is performed based on static images, while very few researchers have addressed the problem of scene recognition through video sequences. We propose a new approach to build a schematic representation of indoor environments for phosphene images. Our schematic representation relies on two parallel CNNs for the extraction of structural informative edges of the room and the relevant object silhouettes based on mask segmentation. We have performed a study with twelve normally sighted subjects to evaluate how our methods were able to the room recognition by presenting phosphenic images and videos. We show how our method is able to increase the recognition ability of the user from 75% using alternative methods to 90% using our approach

arXiv.org e-Print Archive

Single Image Non-uniform Blur Kernel Estimation via Adaptive Basis Decomposition

Author: Carbajal Guillermo
Delbracio Mauricio
Lezama José
Musé Pablo
Vitoria Patricia
Publication venue
Publication date: 01/02/2021
Field of study

Characterizing and removing motion blur caused by camera shake or object motion remains an important task for image restoration. In recent years, removal of motion blur in photographs has seen impressive progress in the hands of deep learning-based methods, trained to map directly from blurry to sharp images. Characterization of motion blur, on the other hand, has received less attention and progress in model-based methods for restoration lags behind that of data-driven end-to-end approaches. In this paper, we propose a general, non-parametric model for dense non-uniform motion blur estimation. Given a blurry image, we estimate a set of adaptive basis kernels as well as the mixing coefficients at pixel level, producing a per-pixel map of motion blur. This rich but efficient forward model of the degradation process allows the utilization of existing tools for solving inverse problems. We show that our method overcomes the limitations of existing non-uniform motion blur estimation and that it contributes to bridging the gap between model-based and data-driven approaches for deblurring real photographs

arXiv.org e-Print Archive

Scene Text Detection via Holistic, Multi-Channel Prediction

Author: Bai Xiang
Cao Zhimin
Sang Nong
Yao Cong
Zhou Shuchang
Zhou Xinyu
Publication venue
Publication date: 05/07/2016
Field of study

Recently, scene text detection has become an active research topic in computer vision and document analysis, because of its great importance and significant challenge. However, vast majority of the existing methods detect text within local regions, typically through extracting character, word or line level candidates followed by candidate aggregation and false positive elimination, which potentially exclude the effect of wide-scope and long-range contextual cues in the scene. To take full advantage of the rich information available in the whole natural image, we propose to localize text in a holistic manner, by casting scene text detection as a semantic segmentation problem. The proposed algorithm directly runs on full images and produces global, pixel-wise prediction maps, in which detections are subsequently formed. To better make use of the properties of text, three types of information regarding text region, individual characters and their relationship are estimated, with a single Fully Convolutional Network (FCN) model. With such predictions of text properties, the proposed algorithm can simultaneously handle horizontal, multi-oriented and curved text in real-world natural images. The experiments on standard benchmarks, including ICDAR 2013, ICDAR 2015 and MSRA-TD500, demonstrate that the proposed algorithm substantially outperforms previous state-of-the-art approaches. Moreover, we report the first baseline result on the recently-released, large-scale dataset COCO-Text.Comment: 10 pages, 9 figures, 5 table

arXiv.org e-Print Archive

cvpaper.challenge in 2016: Futuristic Computer Vision through 1,600 Papers Survey

Author: Abe Kaori
Fuchida Masataka
He Yun
Kanehara Yoshihiro
Kanezaki Asako
Kataoka Hirokatsu
Maruyama Shinya
Matsuzaki Yuta
Miyashita Yudai
Morita Shin'ichiro
Okayasu Kazushige
Shirakabe Soma
Suzuki Teppei
Takasawa Ryosuke
Ueta Shunya
Yabe Toshiyuki
Yatsuyanagi Hiroya
Publication venue
Publication date: 20/07/2017
Field of study

The paper gives futuristic challenges disscussed in the cvpaper.challenge. In 2015 and 2016, we thoroughly study 1,600+ papers in several conferences/journals such as CVPR/ICCV/ECCV/NIPS/PAMI/IJCV

arXiv.org e-Print Archive

Common Representation Learning Using Step-based Correlation Multi-Modal CNN

Author: Bhatt Gaurav
Jha Piyush
Raman Balasubramanian
Publication venue
Publication date: 31/10/2017
Field of study

Deep learning techniques have been successfully used in learning a common representation for multi-view data, wherein the different modalities are projected onto a common subspace. In a broader perspective, the techniques used to investigate common representation learning falls under the categories of canonical correlation-based approaches and autoencoder based approaches. In this paper, we investigate the performance of deep autoencoder based methods on multi-view data. We propose a novel step-based correlation multi-modal CNN (CorrMCNN) which reconstructs one view of the data given the other while increasing the interaction between the representations at each hidden layer or every intermediate step. Finally, we evaluate the performance of the proposed model on two benchmark datasets - MNIST and XRMB. Through extensive experiments, we find that the proposed model achieves better performance than the current state-of-the-art techniques on joint common representation learning and transfer learning tasks.Comment: Accepted in Asian Conference of Pattern Recognition (ACPR-2017

arXiv.org e-Print Archive