Search CORE

2,100 research outputs found

Analyzing Modular CNN Architectures for Joint Depth Prediction and Semantic Segmentation

Author: Groth Oliver
Jafari Omid Hosseini
Kirillov Alexander
Rother Carsten
Yang Michael Ying
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

This paper addresses the task of designing a modular neural network architecture that jointly solves different tasks. As an example we use the tasks of depth estimation and semantic segmentation given a single RGB image. The main focus of this work is to analyze the cross-modality influence between depth and semantic prediction maps on their joint refinement. While most previous works solely focus on measuring improvements in accuracy, we propose a way to quantify the cross-modality influence. We show that there is a relationship between final accuracy and cross-modality influence, although not a simple linear one. Hence a larger cross-modality influence does not necessarily translate into an improved accuracy. We find that a beneficial balance between the cross-modality influences can be achieved by network architecture and conjecture that this relationship can be utilized to understand different network design choices. Towards this end we propose a Convolutional Neural Network (CNN) architecture that fuses the state of the state-of-the-art results for depth estimation and semantic labeling. By balancing the cross-modality influences between depth and semantic prediction, we achieve improved results for both tasks using the NYU-Depth v2 benchmark.Comment: Accepted to ICRA 201

arXiv.org e-Print Archive

Crossref

University of Twente Research Information

A Projected Gradient Descent Method for CRF Inference allowing End-To-End Training of Arbitrary Pairwise Potentials

Author: Arnab Anurag
Kahl Fredrik
Larsson Måns
Torr Philip
Zheng Shuai
Publication venue
Publication date: 01/01/2018
Field of study

Are we using the right potential functions in the Conditional Random Field models that are popular in the Vision community? Semantic segmentation and other pixel-level labelling tasks have made significant progress recently due to the deep learning paradigm. However, most state-of-the-art structured prediction methods also include a random field model with a hand-crafted Gaussian potential to model spatial priors, label consistencies and feature-based image conditioning. In this paper, we challenge this view by developing a new inference and learning framework which can learn pairwise CRF potentials restricted only by their dependence on the image pixel values and the size of the support. Both standard spatial and high-dimensional bilateral kernels are considered. Our framework is based on the observation that CRF inference can be achieved via projected gradient descent and consequently, can easily be integrated in deep neural networks to allow for end-to-end training. It is empirically demonstrated that such learned potentials can improve segmentation accuracy and that certain label class interactions are indeed better modelled by a non-Gaussian potential. In addition, we compare our inference method to the commonly used mean-field algorithm. Our framework is evaluated on several public benchmarks for semantic segmentation with improved performance compared to previous state-of-the-art CNN+CRF models.Comment: Presented at EMMCVPR 2017 conferenc

arXiv.org e-Print Archive

Lund University Publications

Crossref

Oxford University Research Archive

Chalmers Research

A Universalist strategy for the design of Assistive Technology

Author: AOUSSAT Améziane
BUISINE Stéphanie
DUMAS Claude
MANTELET Fabrice
PLOS Ornella
Publication venue: Elsevier
Publication date: 01/01/2012
Field of study

Assistive Technologies are specialized products aiming to partly compensate for the loss of autonomy experienced by disabled people. Because they address special needs in a highly-segmented market, they are often considered as niche products. To improve their design and make them tend to Universality, we propose the EMFASIS framework (Extended Modularity, Functional Accessibility, and Social Integration Strategy). We ﬁrst elaborate on how this strategy conciliates niche and Universalist views, which may appear conﬂicting at ﬁrst sight. We then present three examples illustrating its application for designing Assistive Technologies: the design of an overbed table, an upper-limb powered orthose and a powered wheelchair. We conclude on the expected outcomes of our strategy for the social integration and participation of disabled people

Crossref

SAM : Science Arts et Métiers

Exploring Subtasks of Scene Understanding: Challenges and Cross-Modal Analysis

Author: Hosseini Jafari Omid
Publication venue
Publication date: 01/01/2020
Field of study

Scene understanding is one of the most important problems in computer vision. It consists of many subtasks such as image classification for describing an image with one word, object detection for finding and localizing objects of interest in the image and assigning a category to each of them, semantic segmentation for assigning a category to each pixel of an image, instance segmentation for finding and localizing objects of interest and marking all the pixels belonging to each object, depth estimation for estimating the distance of each pixel in the image from the camera, etc. Each of these tasks has its advantages and limitations. These tasks have a common goal to achieve that is to understand and describe a scene captured in an image or a set of images. One common question is if there is any synergy between these tasks. Therefore, alongside single task approaches, there is a line of research on how to learn multiple tasks jointly. In this thesis, we explore different subtasks of scene understanding and propose mainly deep learning-based approaches to improve these tasks. First, we propose a modular Convolutional Neural Network (CNN) architecture for jointly training semantic segmentation and depth estimation tasks. We provide a setup suitable to analyze the cross-modality influence between these tasks for different architecture designs. Then, we utilize object detection and instance segmentation as auxiliary tasks for focusing on target objects in complex tasks of scene flow estimation and object 6d pose estimation. Furthermore, we propose a novel deep approach for object co-segmentation which is the task of segmenting common objects in a set of images. Finally, we introduce a novel pooling layer that preserves the spatial information while capturing a large receptive field. This pooling layer is designed for improving the dense prediction tasks such as semantic segmentation and depth estimation

Heidelberger Dokumentenserver