170 research outputs found
In-Place Activated BatchNorm for Memory-Optimized Training of DNNs
In this work we present In-Place Activated Batch Normalization (InPlace-ABN)
- a novel approach to drastically reduce the training memory footprint of
modern deep neural networks in a computationally efficient way. Our solution
substitutes the conventionally used succession of BatchNorm + Activation layers
with a single plugin layer, hence avoiding invasive framework surgery while
providing straightforward applicability for existing deep learning frameworks.
We obtain memory savings of up to 50% by dropping intermediate results and by
recovering required information during the backward pass through the inversion
of stored forward results, with only minor increase (0.8-2%) in computation
time. Also, we demonstrate how frequently used checkpointing approaches can be
made computationally as efficient as InPlace-ABN. In our experiments on image
classification, we demonstrate on-par results on ImageNet-1k with
state-of-the-art approaches. On the memory-demanding task of semantic
segmentation, we report results for COCO-Stuff, Cityscapes and Mapillary
Vistas, obtaining new state-of-the-art results on the latter without additional
training data but in a single-scale and -model scenario. Code can be found at
https://github.com/mapillary/inplace_abn
AutoDIAL: Automatic DomaIn Alignment Layers
Classifiers trained on given databases perform poorly when tested on data
acquired in different settings. This is explained in domain adaptation through
a shift among distributions of the source and target domains. Attempts to align
them have traditionally resulted in works reducing the domain shift by
introducing appropriate loss terms, measuring the discrepancies between source
and target distributions, in the objective function. Here we take a different
route, proposing to align the learned representations by embedding in any given
network specific Domain Alignment Layers, designed to match the source and
target feature distributions to a reference one. Opposite to previous works
which define a priori in which layers adaptation should be performed, our
method is able to automatically learn the degree of feature alignment required
at different levels of the deep network. Thorough experiments on different
public benchmarks, in the unsupervised setting, confirm the power of our
approach.Comment: arXiv admin note: substantial text overlap with arXiv:1702.06332
added supplementary materia
Towards Generalization Across Depth for Monocular 3D Object Detection
While expensive LiDAR and stereo camera rigs have enabled the development of
successful 3D object detection methods, monocular RGB-only approaches lag much
behind. This work advances the state of the art by introducing MoVi-3D, a
novel, single-stage deep architecture for monocular 3D object detection.
MoVi-3D builds upon a novel approach which leverages geometrical information to
generate, both at training and test time, virtual views where the object
appearance is normalized with respect to distance. These virtually generated
views facilitate the detection task as they significantly reduce the visual
appearance variability associated to objects placed at different distances from
the camera. As a consequence, the deep model is relieved from learning
depth-specific representations and its complexity can be significantly reduced.
In particular, in this work we show that, thanks to our virtual views
generation process, a lightweight, single-stage architecture suffices to set
new state-of-the-art results on the popular KITTI3D benchmark
3D CNNs on distance matrices for human action recognition
In this paper we are interested in recognizing human actions from sequences of 3D skeleton data. For this purpose we combine a 3D Convolutional Neural Network with body representations based on Euclidean Distance Matrices (EDMs), which have been recently shown to be very effective to capture the geometric structure of the human pose. One inherent limitation of the EDMs, however, is that they are defined up to a permutation of the skeleton joints, i.e., randomly shuffling the ordering of the joints yields many different representations. In oder to address this issue we introduce a novel architecture that simultaneously, and in an end-to-end manner, learns an optimal transformation of the joints, while optimizing the rest of parameters of the convolutional network. The proposed approach achieves state-of-the-art results on 3 benchmarks, including the recent NTU RGB-D dataset, for which we improve on previous LSTM-based methods by more than 10 percentage points, also surpassing other CNN-based methods while using almost 1000 times fewer parameters.Peer ReviewedPostprint (author's final draft
Iatrogenic Hypoglycemia Induced by Valproic Acid in an Adult Patient
Literature on antiepileptic induced iatrogenic hypoglycemia is scanty. Due to its broad spectrum of activity and mechanisms of action, valproic acid (VPA), a fatty acid, is the most widely prescribed epilepsy treatment worldwide.Herein, we describe an adult epileptic patient, where persistent, otherwise unexplained hypoglycemia, was most likely induced by VPA, as suggested by the VPA time course and glucose blood levels. Indeed, no further hypoglycemic episodes occurred after VPA discontinuation and the diagnostic work-up ruled out other possible causes of hypoglycemia.This case supports the hypothesis that VPA may induce hypoglycemia, due to still not well-defined metabolic mechanisms of action. Moreover, it emphasizes the fact that an iatrogenic pathogenesis should be considered if an apparently unexplained hypoglycemia occurs in a patient on chronic therapy with antiepileptics, even at a therapeutical dosage
Learning depth-aware deep representations for robotic perception
© 20xx IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Exploiting RGB-D data by means of Convolutional Neural Networks (CNNs) is at the core of a number of robotics applications, including object detection, scene semantic segmentation and grasping. Most existing approaches, however, exploit RGB-D data by simply considering depth as an additional input channel for the network. In this paper we show that the performance of deep architectures can be boosted by introducing DaConv, a novel, general-purpose CNN block which exploits depth to learn scale-aware feature representations. We demonstrate the benefits of DaConv on a variety of robotics oriented tasks, involving affordance detection, object coordinate regression and contour detection in RGB-D images. In each of these experiments we show the potential of the proposed block and how it can be readily integrated into existing CNN architectures.Peer ReviewedPostprint (author's final draft
- …