2,741 research outputs found

    ComBiNet: Compact Convolutional Bayesian Neural Network for Image Segmentation

    Get PDF
    Fully convolutional U-shaped neural networks have largely been the dominant approach for pixel-wise image segmentation. In this work, we tackle two defects that hinder their deployment in real-world applications: 1) Predictions lack uncertainty quantification that may be crucial to many decision-making systems; 2) Large memory storage and computational consumption demanding extensive hardware resources. To address these issues and improve their practicality we demonstrate a few-parameter compact Bayesian convolutional architecture, that achieves a marginal improvement in accuracy in comparison to related work using significantly fewer parameters and compute operations. The architecture combines parameter-efficient operations such as separable convolutions, bilinear interpolation, multi-scale feature propagation and Bayesian inference for per-pixel uncertainty quantification through Monte Carlo Dropout. The best performing configurations required fewer than 2.5 million parameters on diverse challenging datasets with few observations.Comment: Accepted for publication at ICANN 2021. Code at: https://github.com/martinferianc/ComBiNe

    ComBiNet: Compact Convolutional Bayesian Neural Network for Image Segmentation

    Get PDF
    Fully convolutional U-shaped neural networks have largely been the dominant approach for pixel-wise image segmentation. In this work, we tackle two defects that hinder their deployment in real-world applications: 1) Predictions lack uncertainty quantification that may be crucial to many decision-making systems; 2) Large memory storage and computational consumption demanding extensive hardware resources. To address these issues and improve their practicality we demonstrate a few-parameter compact Bayesian convolutional architecture, that achieves a marginal improvement in accuracy in comparison to related work using significantly fewer parameters and compute operations. The architecture combines parameter-efficient operations such as separable convolutions, bilinear interpolation, multi-scale feature propagation and Bayesian inference for per-pixel uncertainty quantification through Monte Carlo Dropout. The best performing configurations required fewer than 2.5 million parameters on diverse challenging datasets with few observations

    Semantic Image Segmentation and Other Dense Per-Pixel Tasks: Practical Approaches

    Get PDF
    Computer vision-based and deep learning-driven applications and devices are now a part of our everyday life: from modern smartphones with an ever increasing number of cameras and other sensors to autonomous vehicles such as driverless cars and self-piloting drones. Even though a large portion of the algorithms behind those systems has been known for ages, the computational power together with the abundance of labelled data were lacking until recently. Now, following the Occam’s razor principle, we should start re-thinking those algorithms and strive towards their further simplification, both to improve our own understanding and expand the realm of their practical applications. With those goals in mind, in this work we will concentrate on a particular type of computer vision tasks that predict a certain quantity of interest for each pixel in the input image – these are so-called dense per-pixel tasks. This choice is not by chance: while there has been a huge amount of works concentrated on per-image tasks such as image classification with levels of performance reaching nearly 100%, dense per-pixel tasks bring a different set of challenges that traditionally require more computational resources and more complicated approaches. Throughout this thesis, our focus will be on reducing these computational requirements and instead presenting simple approaches to build practical vision systems that can be used in a variety of settings – e.g. indoors or outdoors, on low-resolution or high-resolution images, solving a single task or multiple tasks at once, running on modern GPU cards or on embedded devices such as Jetson TX. In the first part of the manuscript we will adapt an existing powerful but slow semantic segmentation network into a faster and competitive one through a manual re-design and analysis of its building blocks. With this approach, we will achieve nearly 3× decrease in the number of parameters and in the runtime of the network with an equally high accuracy. In the second part we then will alter this compact network in order to solve multiple dense per-pixel tasks at once, still in real-time. We will also demonstrate the value of predicting multiple quantities at once, as an example creating a 3D semantic reconstruction of the scene. In the third part, we will move away from the manual design and instead will rely on reinforcement learning to automatically traverse the search space of compact semantic segmentation architectures. While the majority of architecture search methods are computationally extremely expensive even for image classification, we will present a solution that requires only 2 generic GPU cards. Finally, in the last part we will extend our automatic architecture search solution to discover tiny but still competitive networks with less than 300K parameters taking only 1.5MB of a disk space.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 202

    FMAS: Fast Multi-Objective SuperNet Architecture Search for Semantic Segmentation

    Full text link
    We present FMAS, a fast multi-objective neural architecture search framework for semantic segmentation. FMAS subsamples the structure and pre-trained parameters of DeepLabV3+, without fine-tuning, dramatically reducing training time during search. To further reduce candidate evaluation time, we use a subset of the validation dataset during the search. Only the final, Pareto non-dominated, candidates are ultimately fine-tuned using the complete training set. We evaluate FMAS by searching for models that effectively trade accuracy and computational cost on the PASCAL VOC 2012 dataset. FMAS finds competitive designs quickly, e.g., taking just 0.5 GPU days to discover a DeepLabV3+ variant that reduces FLOPs and parameters by 10%\% and 20%\% respectively, for less than 3%\% increased error. We also search on an edge device called GAP8 and use its latency as the metric. FMAS is capable of finding 2.2×\times faster network with 7.61%\% MIoU loss.Comment: Accepted as a full paper by the TinyML Research Symposium 202
    • …
    corecore