136 research outputs found

    In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

    Full text link
    In this work we present In-Place Activated Batch Normalization (InPlace-ABN) - a novel approach to drastically reduce the training memory footprint of modern deep neural networks in a computationally efficient way. Our solution substitutes the conventionally used succession of BatchNorm + Activation layers with a single plugin layer, hence avoiding invasive framework surgery while providing straightforward applicability for existing deep learning frameworks. We obtain memory savings of up to 50% by dropping intermediate results and by recovering required information during the backward pass through the inversion of stored forward results, with only minor increase (0.8-2%) in computation time. Also, we demonstrate how frequently used checkpointing approaches can be made computationally as efficient as InPlace-ABN. In our experiments on image classification, we demonstrate on-par results on ImageNet-1k with state-of-the-art approaches. On the memory-demanding task of semantic segmentation, we report results for COCO-Stuff, Cityscapes and Mapillary Vistas, obtaining new state-of-the-art results on the latter without additional training data but in a single-scale and -model scenario. Code can be found at https://github.com/mapillary/inplace_abn

    TernausNetV2: Fully Convolutional Network for Instance Segmentation

    Full text link
    The most common approaches to instance segmentation are complex and use two-stage networks with object proposals, conditional random-fields, template matching or recurrent neural networks. In this work we present TernausNetV2 - a simple fully convolutional network that allows extracting objects from a high-resolution satellite imagery on an instance level. The network has popular encoder-decoder type of architecture with skip connections but has a few essential modifications that allows using for semantic as well as for instance segmentation tasks. This approach is universal and allows to extend any network that has been successfully applied for semantic segmentation to perform instance segmentation task. In addition, we generalize network encoder that was pre-trained for RGB images to use additional input channels. It makes possible to use transfer learning from visual to a wider spectral range. For DeepGlobe-CVPR 2018 building detection sub-challenge, based on public leaderboard score, our approach shows superior performance in comparison to other methods. The source code corresponding pre-trained weights are publicly available at https://github.com/ternaus/TernausNetV

    Improving Traffic Safety Through Video Analysis in Jakarta, Indonesia

    Full text link
    This project presents the results of a partnership between the Data Science for Social Good fellowship, Jakarta Smart City and Pulse Lab Jakarta to create a video analysis pipeline for the purpose of improving traffic safety in Jakarta. The pipeline transforms raw traffic video footage into databases that are ready to be used for traffic analysis. By analyzing these patterns, the city of Jakarta will better understand how human behavior and built infrastructure contribute to traffic challenges and safety risks. The results of this work should also be broadly applicable to smart city initiatives around the globe as they improve urban planning and sustainability through data science approaches.Comment: 6 pages; LaTeX; Presented at NeurIPS 2018 Workshop on Machine Learning for the Developing World; Presented at NeurIPS 2018 Workshop on AI for Social Goo

    GPU-friendly neural networks for remote sensing scene classification

    Get PDF
    Convolutional neural networks (CNNs) have proven to be very efficient for the analysis of remote sensing (RS) images. Due to the inherent complexity of extracting features from these images, along with the increasing amount of data to be processed (and the diversity of applications), there is a clear tendency to develop and employ increasingly deep and complex CNNs. In this regard, graphics processing units (GPUs) are frequently used to optimize their execution, both for the training and inference stages, optimizing the performance of neural models through their many-core architecture. Hence, the efficient use of the GPU resources should be at the core of optimizations. This letter analyzes the possibilities of using a new family of CNNs, denoted as TResNets, to provide an efficient solution to the RS scene classification problem. Moreover, the considered models have been combined with mixed precision to enhance their training performance. Our experimental results, conducted over three publicly available RS data sets, show that the proposed networks achieve better accuracy and more efficient use of GPU resources than other state-of-the-art networks. Source code is available at https://github.com/mhaut/GPUfriendlyRS
    • …
    corecore