136 research outputs found
In-Place Activated BatchNorm for Memory-Optimized Training of DNNs
In this work we present In-Place Activated Batch Normalization (InPlace-ABN)
- a novel approach to drastically reduce the training memory footprint of
modern deep neural networks in a computationally efficient way. Our solution
substitutes the conventionally used succession of BatchNorm + Activation layers
with a single plugin layer, hence avoiding invasive framework surgery while
providing straightforward applicability for existing deep learning frameworks.
We obtain memory savings of up to 50% by dropping intermediate results and by
recovering required information during the backward pass through the inversion
of stored forward results, with only minor increase (0.8-2%) in computation
time. Also, we demonstrate how frequently used checkpointing approaches can be
made computationally as efficient as InPlace-ABN. In our experiments on image
classification, we demonstrate on-par results on ImageNet-1k with
state-of-the-art approaches. On the memory-demanding task of semantic
segmentation, we report results for COCO-Stuff, Cityscapes and Mapillary
Vistas, obtaining new state-of-the-art results on the latter without additional
training data but in a single-scale and -model scenario. Code can be found at
https://github.com/mapillary/inplace_abn
TernausNetV2: Fully Convolutional Network for Instance Segmentation
The most common approaches to instance segmentation are complex and use
two-stage networks with object proposals, conditional random-fields, template
matching or recurrent neural networks. In this work we present TernausNetV2 - a
simple fully convolutional network that allows extracting objects from a
high-resolution satellite imagery on an instance level. The network has popular
encoder-decoder type of architecture with skip connections but has a few
essential modifications that allows using for semantic as well as for instance
segmentation tasks. This approach is universal and allows to extend any network
that has been successfully applied for semantic segmentation to perform
instance segmentation task. In addition, we generalize network encoder that was
pre-trained for RGB images to use additional input channels. It makes possible
to use transfer learning from visual to a wider spectral range. For
DeepGlobe-CVPR 2018 building detection sub-challenge, based on public
leaderboard score, our approach shows superior performance in comparison to
other methods. The source code corresponding pre-trained weights are publicly
available at https://github.com/ternaus/TernausNetV
Improving Traffic Safety Through Video Analysis in Jakarta, Indonesia
This project presents the results of a partnership between the Data Science
for Social Good fellowship, Jakarta Smart City and Pulse Lab Jakarta to create
a video analysis pipeline for the purpose of improving traffic safety in
Jakarta. The pipeline transforms raw traffic video footage into databases that
are ready to be used for traffic analysis. By analyzing these patterns, the
city of Jakarta will better understand how human behavior and built
infrastructure contribute to traffic challenges and safety risks. The results
of this work should also be broadly applicable to smart city initiatives around
the globe as they improve urban planning and sustainability through data
science approaches.Comment: 6 pages; LaTeX; Presented at NeurIPS 2018 Workshop on Machine
Learning for the Developing World; Presented at NeurIPS 2018 Workshop on AI
for Social Goo
GPU-friendly neural networks for remote sensing scene classification
Convolutional neural networks (CNNs) have proven to be very efficient for the analysis of remote sensing (RS) images. Due to the inherent complexity of extracting features from these images, along with the increasing amount of data to be processed (and the diversity of applications), there is a clear tendency to develop and employ increasingly deep and complex CNNs. In this regard, graphics processing units (GPUs) are frequently used to optimize their execution, both for the training and inference stages, optimizing the performance of neural models through their many-core architecture. Hence, the efficient use of the GPU resources should be at the core of optimizations. This letter analyzes the possibilities of using a new family of CNNs, denoted as TResNets, to provide an efficient solution to the RS scene classification problem. Moreover, the considered models have been combined with mixed precision to enhance their training performance. Our experimental results, conducted over three publicly available RS data sets, show that the proposed networks achieve better accuracy and more efficient use of GPU resources than other state-of-the-art networks. Source code is available at https://github.com/mhaut/GPUfriendlyRS
- …