34 research outputs found
Joint Learning of Intrinsic Images and Semantic Segmentation
Semantic segmentation of outdoor scenes is problematic when there are
variations in imaging conditions. It is known that albedo (reflectance) is
invariant to all kinds of illumination effects. Thus, using reflectance images
for semantic segmentation task can be favorable. Additionally, not only
segmentation may benefit from reflectance, but also segmentation may be useful
for reflectance computation. Therefore, in this paper, the tasks of semantic
segmentation and intrinsic image decomposition are considered as a combined
process by exploring their mutual relationship in a joint fashion. To that end,
we propose a supervised end-to-end CNN architecture to jointly learn intrinsic
image decomposition and semantic segmentation. We analyze the gains of
addressing those two problems jointly. Moreover, new cascade CNN architectures
for intrinsic-for-segmentation and segmentation-for-intrinsic are proposed as
single tasks. Furthermore, a dataset of 35K synthetic images of natural
environments is created with corresponding albedo and shading (intrinsics), as
well as semantic labels (segmentation) assigned to each object/scene. The
experiments show that joint learning of intrinsic image decomposition and
semantic segmentation is beneficial for both tasks for natural scenes. Dataset
and models are available at: https://ivi.fnwi.uva.nl/cv/intrinsegComment: ECCV 201
NODIS: Neural Ordinary Differential Scene Understanding
Semantic image understanding is a challenging topic in computer vision. It
requires to detect all objects in an image, but also to identify all the
relations between them. Detected objects, their labels and the discovered
relations can be used to construct a scene graph which provides an abstract
semantic interpretation of an image. In previous works, relations were
identified by solving an assignment problem formulated as Mixed-Integer Linear
Programs. In this work, we interpret that formulation as Ordinary Differential
Equation (ODE). The proposed architecture performs scene graph inference by
solving a neural variant of an ODE by end-to-end learning. It achieves
state-of-the-art results on all three benchmark tasks: scene graph generation
(SGGen), classification (SGCls) and visual relationship detection (PredCls) on
Visual Genome benchmark
Semantic Image Segmentation Using Visible and Near-Infrared Channels
Recent progress in computational photography has shown that we can acquire physical information beyond visible (RGB) image representations. In particular, we can acquire near-infrared (NIR) cues with only slight modification to any standard digital camera. In this paper, we study whether this extra channel can improve semantic image segmentation. Based on a state-of-the-art segmentation framework and a novel manually segmented image database that contains 4-channel images (RGB+NIR), we study how to best incorporate the specific characteristics of the NIR response. We show that it leads to improved performances for 7 classes out of 10 in the proposed dataset and discuss the results with respect to the physical properties of the NIR response
A high performance CRF model for clothes parsing
In this paper we tackle the problem of clothing parsing: Our goal is to segment and classify different garments a person is wearing. We frame the problem as the one of inference in a pose-aware Conditional Random Field (CRF) which exploits appearance, figure/ground segmentation, shape and location priors for each garment as well as similarities between segments, and symmetries between different human body parts. We demonstrate the effectiveness of our approach on the Fashionista dataset and show that we can obtain a significant improvement over the state-of-the-art.Peer ReviewedPostprint (published version
Joint optimisation for object class segmentation and dense stereo reconstruction
This work is supported by EPSRC research grants, HMGCC, TUBITAK researcher exchange grant, the IST Programme of the European Community, under the PASCAL2 Network of Excellence, IST-2007-216886.The problems of dense stereo reconstruction and object class segmentation can both be formulated as Conditional Random Field based labelling problems, in which every pixel in the image is assigned a label corresponding to either its disparity, or an object class such as road or building. While these two problems are mutually informative, no attempt has been made to jointly optimise their labellings. In this work we provide a principled energy minimisation framework that unifies the two problems and demonstrate that, by resolving ambiguities in real world data, joint optimisation of the two problems substantially improves performance. To evaluate our method, we augment the street view Leuven data set, producing 70 hand labelled object class and disparity maps. We hope that the release of these annotations will stimulate further work in the challenging domain of street-view analysis
Image-Based Large-Scale Geo-localization in Mountainous Regions
Given a picture taken somewhere in the world, automatic geo-localization of such an image is an extremely useful task especially for historical and forensic sciences, documentation purposes, organization of the world’s photographs and intelligence applications. While tremendous progress has been made over the last years in visual location recognition within a single city, localization in natural environments is much more difficult, since vegetation, illumination, seasonal changes make appearance-only approaches impractical. In this chapter, we target mountainous terrain and use digital elevation models to extract representations for fast visual database lookup. We propose an automated approach for very large-scale visual localization that can efficiently exploit visual information (contours) and geometric constraints (consistent orientation) at the same time. We validate the system at the scale of Switzerland (40000km2) using over 1000 landscape query images with ground truth GPS position