Search CORE

90 research outputs found

Double Refinement Network for Efficient Indoor Monocular Depth Estimation

Author: Bogomolov Pavel
Bubnova Valeriya
Durasov Nikita
Konushin Anton
Romanov Mikhail
Publication venue
Publication date: 04/04/2019
Field of study

Monocular depth estimation is the task of obtaining a measure of distance for each pixel using a single image. It is an important problem in computer vision and is usually solved using neural networks. Though recent works in this area have shown significant improvement in accuracy, the state-of-the-art methods tend to require massive amounts of memory and time to process an image. The main purpose of this work is to improve the performance of the latest solutions with no decrease in accuracy. To this end, we introduce the Double Refinement Network architecture. The proposed method achieves state-of-the-art results on the standard benchmark RGB-D dataset NYU Depth v2, while its frames per second rate is significantly higher (up to 18 times speedup per image at batch size 1) and the RAM usage per image is lower

arXiv.org e-Print Archive

INDOOR SEMANTIC SEGMENTATION FROM RGB-D IMAGES BY INTEGRATING FULLY CONVOLUTIONAL NETWORK WITH HIGHER-ORDER MARKOV RANDOM FIELD

Author: J. Yang
Z. Kang
Publication venue: 'Copernicus GmbH'
Publication date: 01/09/2018
Field of study

Indoor scenes have the characteristics of abundant semantic categories, illumination changes, occlusions and overlaps among objects, which poses great challenges for indoor semantic segmentation. Therefore, we in this paper develop a method based on higher-order Markov random field model for indoor semantic segmentation from RGB-D images. Instead of directly using RGB-D images, we first train and perform RefineNet model only using RGB information for generating the high-level semantic information. Then, the spatial location relationship from depth channel and the spectral information from color channels are integrated as a prior for a marker-controlled watershed algorithm to obtain the robust and accurate visual homogenous regions. Finally, higher-order Markov random field model encodes the short-range context among the adjacent pixels and the long-range context within each visual homogenous region for refining the semantic segmentations. To evaluate the effectiveness and robustness of the proposed method, experiments were conducted on the public SUN RGB-D dataset. Experimental results indicate that compared with using RGB information alone, the proposed method remarkably improves the semantic segmentation results, especially at object boundaries

Directory of Open Access Journals

Non-local Aggregation for RGB-D Semantic Segmentation

Author: Wang G
Xie P
Xue J-H
Yang F
Zhang G
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Exploiting both RGB (2D appearance) and Depth (3D geometry) information can improve the performance of semantic segmentation. However, due to the inherent difference between the RGB and Depth information, it remains a challenging problem in how to integrate RGB-D features effectively. In this letter, to address this issue, we propose a Non-local Aggregation Network (NANet), with a well-designed Multi-modality Non-local Aggregation Module (MNAM), to better exploit the non-local context of RGB-D features at multi-stage. Compared with most existing RGB-D semantic segmentation schemes, which only exploit local RGB-D features, the MNAM enables the aggregation of non-local RGB-D information along both spatial and channel dimensions. The proposed NANet achieves comparable performances with state-of-the-art methods on popular RGB-D benchmarks, NYUDv2 and SUN-RGBD

UCL Discovery

Semantic Understanding of Foggy Scenes with Purely Synthetic Data

Author: Dai Dengxin
Hahner Martin
Sakaridis Christos
Van Gool Luc
Zaech Jan-Nico
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/06/2020
Field of study

This work addresses the problem of semantic scene understanding under foggy road conditions. Although marked progress has been made in semantic scene understanding over the recent years, it is mainly concentrated on clear weather outdoor scenes. Extending semantic segmentation methods to adverse weather conditions like fog is crucially important for outdoor applications such as self-driving cars. In this paper, we propose a novel method, which uses purely synthetic data to improve the performance on unseen real-world foggy scenes captured in the streets of Zurich and its surroundings. Our results highlight the potential and power of photo-realistic synthetic images for training and especially fine-tuning deep neural nets. Our contributions are threefold, 1) we created a purely synthetic, high-quality foggy dataset of 25,000 unique outdoor scenes, that we call Foggy Synscapes and plan to release publicly 2) we show that with this data we outperform previous approaches on real-world foggy test data 3) we show that a combination of our data and previously used data can even further improve the performance on real-world foggy data.Comment: independent class IoU scores corrected for BiSiNet architectur

arXiv.org e-Print Archive

Crossref

RefineNet: multi-path refinement networks for dense prediction

Author: Lin G.
Liu F.
Milan A.
Reid I.
Shen C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Recently, very deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition and have also been the first choice for dense prediction problems such as semantic segmentation and depth estimation. However, repeated subsampling operations like pooling or convolution striding in deep CNNs lead to a significant decrease in the initial image resolution. Here, we present RefineNet, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections. In this way, the deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier convolutions. The individual components of RefineNet employ residual connections following the identity mapping mindset, which allows for effective end-to-end training. Further, we introduce chained residual pooling, which captures rich background context in an efficient manner. We carry out comprehensive experiments on semantic segmentation which is a dense classification problem and achieve good performance on seven public datasets. We further apply our method for depth estimation and demonstrate the effectiveness of our method on dense regression problems.Guosheng Lin, Fayao Liu, Anton Milan, Chunhua Shen, and Ian Rei

Crossref

Adelaide Research & Scholarship