4,697 research outputs found
Learning End-To-End Scene Flow by Distilling Single Tasks Knowledge
Scene flow is a challenging task aimed at jointly estimating the 3D structure
and motion of the sensed environment. Although deep learning solutions achieve
outstanding performance in terms of accuracy, these approaches divide the whole
problem into standalone tasks (stereo and optical flow) addressing them with
independent networks. Such a strategy dramatically increases the complexity of
the training procedure and requires power-hungry GPUs to infer scene flow
barely at 1 FPS. Conversely, we propose DWARF, a novel and lightweight
architecture able to infer full scene flow jointly reasoning about depth and
optical flow easily and elegantly trainable end-to-end from scratch. Moreover,
since ground truth images for full scene flow are scarce, we propose to
leverage on the knowledge learned by networks specialized in stereo or flow,
for which much more data are available, to distill proxy annotations.
Exhaustive experiments show that i) DWARF runs at about 10 FPS on a single
high-end GPU and about 1 FPS on NVIDIA Jetson TX2 embedded at KITTI resolution,
with moderate drop in accuracy compared to 10x deeper models, ii) learning from
many distilled samples is more effective than from the few, annotated ones
available. Code available at:
https://github.com/FilippoAleotti/Dwarf-TensorflowComment: Accepted to AAAI 2020. Project page:
https://vision.disi.unibo.it/~faleotti/dwarf.htm
Deep Lidar CNN to Understand the Dynamics of Moving Vehicles
Perception technologies in Autonomous Driving are experiencing their golden
age due to the advances in Deep Learning. Yet, most of these systems rely on
the semantically rich information of RGB images. Deep Learning solutions
applied to the data of other sensors typically mounted on autonomous cars (e.g.
lidars or radars) are not explored much. In this paper we propose a novel
solution to understand the dynamics of moving vehicles of the scene from only
lidar information. The main challenge of this problem stems from the fact that
we need to disambiguate the proprio-motion of the 'observer' vehicle from that
of the external 'observed' vehicles. For this purpose, we devise a CNN
architecture which at testing time is fed with pairs of consecutive lidar
scans. However, in order to properly learn the parameters of this network,
during training we introduce a series of so-called pretext tasks which also
leverage on image data. These tasks include semantic information about
vehicleness and a novel lidar-flow feature which combines standard image-based
optical flow with lidar scans. We obtain very promising results and show that
including distilled image information only during training, allows improving
the inference results of the network at test time, even when image data is no
longer used.Comment: Presented in IEEE ICRA 2018. IEEE Copyrights: Personal use of this
material is permitted. Permission from IEEE must be obtained for all other
uses. (V2 just corrected comments on arxiv submission
What Works at Scale? Distilling the Critical Success Factors for Scaling Up Rural Sanitation
This paper is based on the Knowledge Sharing Forum of the same name. It examines the conditions for success in sanitation programs and strategies that lead to robust implementation in various countries
- …