1,113 research outputs found
Part Detector Discovery in Deep Convolutional Neural Networks
Current fine-grained classification approaches often rely on a robust
localization of object parts to extract localized feature representations
suitable for discrimination. However, part localization is a challenging task
due to the large variation of appearance and pose. In this paper, we show how
pre-trained convolutional neural networks can be used for robust and efficient
object part discovery and localization without the necessity to actually train
the network on the current dataset. Our approach called "part detector
discovery" (PDD) is based on analyzing the gradient maps of the network outputs
and finding activation centers spatially related to annotated semantic parts or
bounding boxes.
This allows us not just to obtain excellent performance on the CUB200-2011
dataset, but in contrast to previous approaches also to perform detection and
bird classification jointly without requiring a given bounding box annotation
during testing and ground-truth parts during training. The code is available at
http://www.inf-cv.uni-jena.de/part_discovery and
https://github.com/cvjena/PartDetectorDisovery.Comment: Accepted for publication on Asian Conference on Computer Vision
(ACCV) 201
Deep Depth From Focus
Depth from focus (DFF) is one of the classical ill-posed inverse problems in
computer vision. Most approaches recover the depth at each pixel based on the
focal setting which exhibits maximal sharpness. Yet, it is not obvious how to
reliably estimate the sharpness level, particularly in low-textured areas. In
this paper, we propose `Deep Depth From Focus (DDFF)' as the first end-to-end
learning approach to this problem. One of the main challenges we face is the
hunger for data of deep neural networks. In order to obtain a significant
amount of focal stacks with corresponding groundtruth depth, we propose to
leverage a light-field camera with a co-calibrated RGB-D sensor. This allows us
to digitally create focal stacks of varying sizes. Compared to existing
benchmarks our dataset is 25 times larger, enabling the use of machine learning
for this inverse problem. We compare our results with state-of-the-art DFF
methods and we also analyze the effect of several key deep architectural
components. These experiments show that our proposed method `DDFFNet' achieves
state-of-the-art performance in all scenes, reducing depth error by more than
75% compared to the classical DFF methods.Comment: accepted to Asian Conference on Computer Vision (ACCV) 201
A Multi-scale Bilateral Structure Tensor Based Corner Detector
9th Asian Conference on Computer Vision, ACCV 2009, Xi'an, 23-27 September 2009In this paper, a novel multi-scale nonlinear structure tensor based corner detection algorithm is proposed to improve effectively the classical Harris corner detector. By considering both the spatial and gradient distances of neighboring pixels, a nonlinear bilateral structure tensor is constructed to examine the image local pattern. It can be seen that the linear structure tensor used in the original Harris corner detector is a special case of the proposed bilateral one by considering only the spatial distance. Moreover, a multi-scale filtering scheme is developed to tell the trivial structures from true corners based on their different characteristics in multiple scales. The comparison between the proposed approach and four representative and state-of-the-art corner detectors shows that our method has much better performance in terms of both detection rate and localization accuracy.Department of ComputingRefereed conference pape
Novel-View Human Action Synthesis
Novel-View Human Action Synthesis aims to synthesize the movement of a body
from a virtual viewpoint, given a video from a real viewpoint. We present a
novel 3D reasoning to synthesize the target viewpoint. We first estimate the 3D
mesh of the target body and transfer the rough textures from the 2D images to
the mesh. As this transfer may generate sparse textures on the mesh due to
frame resolution or occlusions. We produce a semi-dense textured mesh by
propagating the transferred textures both locally, within local geodesic
neighborhoods, and globally, across symmetric semantic parts. Next, we
introduce a context-based generator to learn how to correct and complete the
residual appearance information. This allows the network to independently focus
on learning the foreground and background synthesis tasks. We validate the
proposed solution on the public NTU RGB+D dataset. The code and resources are
available at https://bit.ly/36u3h4K.Comment: Asian Conference on Computer Vision (ACCV) 202
- …