90 research outputs found
Double Refinement Network for Efficient Indoor Monocular Depth Estimation
Monocular depth estimation is the task of obtaining a measure of distance for
each pixel using a single image. It is an important problem in computer vision
and is usually solved using neural networks. Though recent works in this area
have shown significant improvement in accuracy, the state-of-the-art methods
tend to require massive amounts of memory and time to process an image. The
main purpose of this work is to improve the performance of the latest solutions
with no decrease in accuracy. To this end, we introduce the Double Refinement
Network architecture. The proposed method achieves state-of-the-art results on
the standard benchmark RGB-D dataset NYU Depth v2, while its frames per second
rate is significantly higher (up to 18 times speedup per image at batch size 1)
and the RAM usage per image is lower
INDOOR SEMANTIC SEGMENTATION FROM RGB-D IMAGES BY INTEGRATING FULLY CONVOLUTIONAL NETWORK WITH HIGHER-ORDER MARKOV RANDOM FIELD
Indoor scenes have the characteristics of abundant semantic categories, illumination changes, occlusions and overlaps among objects, which poses great challenges for indoor semantic segmentation. Therefore, we in this paper develop a method based on higher-order Markov random field model for indoor semantic segmentation from RGB-D images. Instead of directly using RGB-D images, we first train and perform RefineNet model only using RGB information for generating the high-level semantic information. Then, the spatial location relationship from depth channel and the spectral information from color channels are integrated as a prior for a marker-controlled watershed algorithm to obtain the robust and accurate visual homogenous regions. Finally, higher-order Markov random field model encodes the short-range context among the adjacent pixels and the long-range context within each visual homogenous region for refining the semantic segmentations. To evaluate the effectiveness and robustness of the proposed method, experiments were conducted on the public SUN RGB-D dataset. Experimental results indicate that compared with using RGB information alone, the proposed method remarkably improves the semantic segmentation results, especially at object boundaries
Non-local Aggregation for RGB-D Semantic Segmentation
Exploiting both RGB (2D appearance) and Depth (3D geometry) information can improve the performance of semantic segmentation. However, due to the inherent difference between the RGB and Depth information, it remains a challenging problem in how to integrate RGB-D features effectively. In this letter, to address this issue, we propose a Non-local Aggregation Network (NANet), with a well-designed Multi-modality Non-local Aggregation Module (MNAM), to better exploit the non-local context of RGB-D features at multi-stage. Compared with most existing RGB-D semantic segmentation schemes, which only exploit local RGB-D features, the MNAM enables the aggregation of non-local RGB-D information along both spatial and channel dimensions. The proposed NANet achieves comparable performances with state-of-the-art methods on popular RGB-D benchmarks, NYUDv2 and SUN-RGBD
Semantic Understanding of Foggy Scenes with Purely Synthetic Data
This work addresses the problem of semantic scene understanding under foggy
road conditions. Although marked progress has been made in semantic scene
understanding over the recent years, it is mainly concentrated on clear weather
outdoor scenes. Extending semantic segmentation methods to adverse weather
conditions like fog is crucially important for outdoor applications such as
self-driving cars. In this paper, we propose a novel method, which uses purely
synthetic data to improve the performance on unseen real-world foggy scenes
captured in the streets of Zurich and its surroundings. Our results highlight
the potential and power of photo-realistic synthetic images for training and
especially fine-tuning deep neural nets. Our contributions are threefold, 1) we
created a purely synthetic, high-quality foggy dataset of 25,000 unique outdoor
scenes, that we call Foggy Synscapes and plan to release publicly 2) we show
that with this data we outperform previous approaches on real-world foggy test
data 3) we show that a combination of our data and previously used data can
even further improve the performance on real-world foggy data.Comment: independent class IoU scores corrected for BiSiNet architectur
RefineNet: multi-path refinement networks for dense prediction
Recently, very deep convolutional neural networks (CNNs) have shown outstanding performance in object recognition and have also been the first choice for dense prediction problems such as semantic segmentation and depth estimation. However, repeated subsampling operations like pooling or convolution striding in deep CNNs lead to a significant decrease in the initial image resolution. Here, we present RefineNet, a generic multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections. In this way, the deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier convolutions. The individual components of RefineNet employ residual connections following the identity mapping mindset, which allows for effective end-to-end training. Further, we introduce chained residual pooling, which captures rich background context in an efficient manner. We carry out comprehensive experiments on semantic segmentation which is a dense classification problem and achieve good performance on seven public datasets. We further apply our method for depth estimation and demonstrate the effectiveness of our method on dense regression problems.Guosheng Lin, Fayao Liu, Anton Milan, Chunhua Shen, and Ian Rei
- …