Search CORE

1,223 research outputs found

Segmentation-Aware Convolutional Networks Using Local Attention Masks

Author: Derpanis Konstantinos G.
Harley Adam W.
Kokkinos Iasonas
Publication venue
Publication date: 15/08/2017
Field of study

We introduce an approach to integrate segmentation information within a convolutional neural network (CNN). This counter-acts the tendency of CNNs to smooth information across regions and increases their spatial precision. To obtain segmentation information, we set up a CNN to provide an embedding space where region co-membership can be estimated based on Euclidean distance. We use these embeddings to compute a local attention mask relative to every neuron position. We incorporate such masks in CNNs and replace the convolution operation with a "segmentation-aware" variant that allows a neuron to selectively attend to inputs coming from its own region. We call the resulting network a segmentation-aware CNN because it adapts its filters at each image point according to local segmentation cues. We demonstrate the merit of our method on two widely different dense prediction tasks, that involve classification (semantic segmentation) and regression (optical flow). Our results show that in semantic segmentation we can match the performance of DenseCRFs while being faster and simpler, and in optical flow we obtain clearly sharper responses than networks that do not use local attention masks. In both cases, segmentation-aware convolution yields systematic improvements over strong baselines. Source code for this work is available online at http://cs.cmu.edu/~aharley/segaware

arXiv.org e-Print Archive

UCL Discovery

Scene understanding from 3D point clouds and RGB images for autonomous driving

Author: Rolando de Sousa Chichorro Avides Moreira
Publication venue
Publication date: 22/07/2021
Field of study

Autonomous cars are often equipped with 3D data acquisition sensors and devices, e.g., LiDAR, which provide a 3D point cloud that describes the surroundings. Direct acquisition of 3D data from these sensors is commonly used for obstacle avoidance and mapping. Analysing 3D point clouds is complex since point clouds are unstructured, unordered, and contain a varying number of points. The most common approach used for scene understanding in images is the Convolutional Neural Network. Although CNNs achieve high performance in image analysis, they cannot be applied naturally on point clouds. Several methods for extending CNNs to 3D point cloud analysis have been proposed, such as rasterization into a 3D voxel grid to use directly a CNN or using a Graph Convolutional Network. The main goal of this dissertation is to study and compare different approaches for scene understanding from 3D point clouds within the scope of driving automation systems. Moreover, the project contemplates the study of sensor fusion approaches, namely how to combine 3D point clouds and images. In light of this, this project uses a sensor fusion technique called pointpainting, which uses images segmentation to enhance 3D object detection on point clouds

Repositório Aberto da Universidade do Porto

Context-Aware Single-Shot Detector

Author: Athitsos Vassilis
Xiang Wei
Yu Heather
Zhang Dong-Qing
Publication venue
Publication date: 24/03/2018
Field of study

SSD is one of the state-of-the-art object detection algorithms, and it combines high detection accuracy with real-time speed. However, it is widely recognized that SSD is less accurate in detecting small objects compared to large objects, because it ignores the context from outside the proposal boxes. In this paper, we present CSSD--a shorthand for context-aware single-shot multibox object detector. CSSD is built on top of SSD, with additional layers modeling multi-scale contexts. We describe two variants of CSSD, which differ in their context layers, using dilated convolution layers (DiCSSD) and deconvolution layers (DeCSSD) respectively. The experimental results show that the multi-scale context modeling significantly improves the detection accuracy. In addition, we study the relationship between effective receptive fields (ERFs) and the theoretical receptive fields (TRFs), particularly on a VGGNet. The empirical results further strengthen our conclusion that SSD coupled with context layers achieves better detection results especially for small objects (

+3.2\% {\rm AP}_{@0.5}

on MS-COCO compared to the newest SSD), while maintaining comparable runtime performance

arXiv.org e-Print Archive

Crossref