Search CORE

17,778 research outputs found

Active occlusion-handling for appearance-based object recognition models

Author: Struwe Marvin
Publication venue: Universität Bielefeld
Publication date: 01/01/2017
Field of study

Struwe M. Active occlusion-handling for appearance-based object recognition models. Bielefeld: Universität Bielefeld; 2017.Despite extensive research, visual detection of objects in natural scenes is still not robustly solved. The reason for this is the large variation in appearance in which objects or classes occur. A particularly challenging variation is occlusion, which is caused by the constellation of objects in a scene. Occlusion reduces the number of visible features of an object, but also causes accidental features. Current object representations yield acceptable results during a low to medium level of occlusion, but fail for stronger occlusions. This thesis addresses single image-based object recognition during occlusion and proposes different occlusion-handling strategies. Initially, it depicts a holistic discriminative car detection framework, which several chapters use as reference system. Motivated by a label analysis of hand-annotated video traffic scenes, it then presents a car detector, taking car-car constellations into account. The following chapter illustrates a modification of the reference system to cover with more general occlusion constellations. Inspired by the fact that parts-based detection approaches are more robust against occlusion, the next chapter discusses a parts-based car detector with active occlusion-handling at the detection step. At first, this exploits a strategy using the mask of the occluding object to re-weight the score of possible car hypotheses. This is followed by the presentation of an extended version, which especially targets strongly occluded cars. Due to the fact that hand-annotated video streams do not provide pixel-level information about the object instances, this thesis presents a rendered benchmark data set to resolve this issue. The pixel-level information permits intensive evaluation of occlusion-handling strategies. An eye-tracker study also uses this rendered data set to explore how humans cope with the absence of visual object features, and which information they use to deal with occlusion

Publications at Bielefeld University

What is Holding Back Convnets for Detection?

Author: D Hoiem
H Li
J Xu
M Everingham
P Agrawal
Y Bengio
Publication venue
Publication date: 01/01/2015
Field of study

Convolutional neural networks have recently shown excellent results in general object detection and many other tasks. Albeit very effective, they involve many user-defined design choices. In this paper we want to better understand these choices by inspecting two key aspects "what did the network learn?", and "what can the network learn?". We exploit new annotations (Pascal3D+), to enable a new empirical analysis of the R-CNN detector. Despite common belief, our results indicate that existing state-of-the-art convnet architectures are not invariant to various appearance factors. In fact, all considered networks have similar weak points which cannot be mitigated by simply increasing the training data (architectural changes are needed). We show that overall performance can improve when using image renderings for data augmentation. We report the best known results on the Pascal3D+ detection and view-point estimation tasks

arXiv.org e-Print Archive

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe