8 research outputs found
Deformable Kernel Networks for Joint Image Filtering
Joint image filters are used to transfer structural details from a guidance
picture used as a prior to a target image, in tasks such as enhancing spatial
resolution and suppressing noise. Previous methods based on convolutional
neural networks (CNNs) combine nonlinear activations of spatially-invariant
kernels to estimate structural details and regress the filtering result. In
this paper, we instead learn explicitly sparse and spatially-variant kernels.
We propose a CNN architecture and its efficient implementation, called the
deformable kernel network (DKN), that outputs sets of neighbors and the
corresponding weights adaptively for each pixel. The filtering result is then
computed as a weighted average. We also propose a fast version of DKN that runs
about seventeen times faster for an image of size 640 x 480. We demonstrate the
effectiveness and flexibility of our models on the tasks of depth map
upsampling, saliency map upsampling, cross-modality image restoration, texture
removal, and semantic segmentation. In particular, we show that the weighted
averaging process with sparsely sampled 3 x 3 kernels outperforms the state of
the art by a significant margin in all cases.Comment: arXiv admin note: substantial text overlap with arXiv:1903.11286
(IJCV accepted
G2-MonoDepth: A General Framework of Generalized Depth Inference from Monocular RGB+X Data
Monocular depth inference is a fundamental problem for scene perception of
robots. Specific robots may be equipped with a camera plus an optional depth
sensor of any type and located in various scenes of different scales, whereas
recent advances derived multiple individual sub-tasks. It leads to additional
burdens to fine-tune models for specific robots and thereby high-cost
customization in large-scale industrialization. This paper investigates a
unified task of monocular depth inference, which infers high-quality depth maps
from all kinds of input raw data from various robots in unseen scenes. A basic
benchmark G2-MonoDepth is developed for this task, which comprises four
components: (a) a unified data representation RGB+X to accommodate RGB plus raw
depth with diverse scene scale/semantics, depth sparsity ([0%, 100%]) and
errors (holes/noises/blurs), (b) a novel unified loss to adapt to diverse depth
sparsity/errors of input raw data and diverse scales of output scenes, (c) an
improved network to well propagate diverse scene scales from input to output,
and (d) a data augmentation pipeline to simulate all types of real artifacts in
raw depth maps for training. G2-MonoDepth is applied in three sub-tasks
including depth estimation, depth completion with different sparsity, and depth
enhancement in unseen scenes, and it always outperforms SOTA baselines on both
real-world data and synthetic data.Comment: 18 pages, 16 figure