5,329 research outputs found
Structure Preserving Large Imagery Reconstruction
With the explosive growth of web-based cameras and mobile devices, billions
of photographs are uploaded to the internet. We can trivially collect a huge
number of photo streams for various goals, such as image clustering, 3D scene
reconstruction, and other big data applications. However, such tasks are not
easy due to the fact the retrieved photos can have large variations in their
view perspectives, resolutions, lighting, noises, and distortions.
Fur-thermore, with the occlusion of unexpected objects like people, vehicles,
it is even more challenging to find feature correspondences and reconstruct
re-alistic scenes. In this paper, we propose a structure-based image completion
algorithm for object removal that produces visually plausible content with
consistent structure and scene texture. We use an edge matching technique to
infer the potential structure of the unknown region. Driven by the estimated
structure, texture synthesis is performed automatically along the estimated
curves. We evaluate the proposed method on different types of images: from
highly structured indoor environment to natural scenes. Our experimental
results demonstrate satisfactory performance that can be potentially used for
subsequent big data processing, such as image localization, object retrieval,
and scene reconstruction. Our experiments show that this approach achieves
favorable results that outperform existing state-of-the-art techniques
Light Field Salient Object Detection: A Review and Benchmark
Salient object detection (SOD) is a long-standing research topic in computer
vision and has drawn an increasing amount of research interest in the past
decade. This paper provides the first comprehensive review and benchmark for
light field SOD, which has long been lacking in the saliency community.
Firstly, we introduce preliminary knowledge on light fields, including theory
and data forms, and then review existing studies on light field SOD, covering
ten traditional models, seven deep learning-based models, one comparative
study, and one brief review. Existing datasets for light field SOD are also
summarized with detailed information and statistical analyses. Secondly, we
benchmark nine representative light field SOD models together with several
cutting-edge RGB-D SOD models on four widely used light field datasets, from
which insightful discussions and analyses, including a comparison between light
field SOD and RGB-D SOD models, are achieved. Besides, due to the inconsistency
of datasets in their current forms, we further generate complete data and
supplement focal stacks, depth maps and multi-view images for the inconsistent
datasets, making them consistent and unified. Our supplemental data makes a
universal benchmark possible. Lastly, because light field SOD is quite a
special problem attributed to its diverse data representations and high
dependency on acquisition hardware, making it differ greatly from other
saliency detection tasks, we provide nine hints into the challenges and future
directions, and outline several open issues. We hope our review and
benchmarking could help advance research in this field. All the materials
including collected models, datasets, benchmarking results, and supplemented
light field datasets will be publicly available on our project site
https://github.com/kerenfu/LFSOD-Survey
Video Salient Object Detection via Fully Convolutional Networks
This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps)
- …