16 research outputs found

    Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks

    Get PDF
    Convolutional neural networks excel in a number of computer vision tasks. One of their most crucial architectural elements is the effective receptive field size, that has to be manually set to accommodate a specific task. Standard solutions involve large kernels, down/up-sampling and dilated convolutions. These require testing a variety of dilation and down/up-sampling factors and result in non-compact representations and excessive number of parameters. We address this issue by proposing a new convolution filter composed of displaced aggregation units (DAU). DAUs learn spatial displacements and adapt the receptive field sizes of individual convolution filters to a given problem, thus eliminating the need for hand-crafted modifications. DAUs provide a seamless substitution of convolutional filters in existing state-of-the-art architectures, which we demonstrate on AlexNet, ResNet50, ResNet101, DeepLab and SRN-DeblurNet. The benefits of this design are demonstrated on a variety of computer vision tasks and datasets, such as image classification (ILSVRC 2012), semantic segmentation (PASCAL VOC 2011, Cityscape) and blind image de-blurring (GOPRO). Results show that DAUs efficiently allocate parameters resulting in up to four times more compact networks at similar or better performance.Comment: Accepted for publication in International Journal of Computer Vision, Jan 02 202

    Spatially-adaptive filter units for deep neural networks

    Get PDF
    Classical deep convolutional networks increase receptive field size by either gradual resolution reduction or application of hand-crafted dilated convolutions to prevent increase in the number of parameters. In this paper we propose a novel displaced aggregation unit (DAU) that does not require hand-crafting. In contrast to classical filters with units (pixels) placed on a fixed regular grid, the displacement of the DAUs are learned, which enables filters to spatially-adapt their receptive field to a given problem. We extensively demonstrate the strength of DAUs on a classification and semantic segmentation tasks. Compared to ConvNets with regular filter, ConvNets with DAUs achieve comparable performance at faster convergence and up to 3-times reduction in parameters. Furthermore, DAUs allow us to study deep networks from novel perspectives. We study spatial distributions of DAU filters and analyze the number of parameters allocated for spatial coverage in a filter.Comment: Accepted to Computer Vision and Pattern Recognition 201

    Towards deep compositional networks

    Get PDF
    Hierarchical feature learning based on convolutional neural networks (CNN) has recently shown significant potential in various computer vision tasks. While allowing high-quality discriminative feature learning, the downside of CNNs is the lack of explicit structure in features, which often leads to overfitting, absence of reconstruction from partial observations and limited generative abilities. Explicit structure is inherent in hierarchical compositional models, however, these lack the ability to optimize a well-defined cost function. We propose a novel analytic model of a basic unit in a layered hierarchical model with both explicit compositional structure and a well-defined discriminative cost function. Our experiments on two datasets show that the proposed compositional model performs on a par with standard CNNs on discriminative tasks, while, due to explicit modeling of the structure in the feature units, affording a straight-forward visualization of parts and faster inference due to separability of the units. ActionsComment: Published in proceedings of 23th International Conference on Pattern Recognition (ICPR 2016

    Towards deep compositional networks

    Get PDF

    Reprezentacija vizualnih entitet z globokimi hierarhičnimi in kompozicionalnimi modeli

    Full text link
    The doctoral thesis explores two prominent hierarchical approaches for the modeling of visual entities: (a) compositional hierarchies and (b) deep neural networks. Both approaches are explored in detail together with their advantages and disadvantages. In compositional hierarchies, poor discriminative power is identified as a major limiting factor, which is address with a novel discriminative feature, termed Histogram of Compositions, proposed in the first part of this thesis. HoC is shown to successfully capture important discriminative information to improve classification accuracy. The second part of the thesis highlights the lack of a spatial relationship between features as an important limitation of deep convolutional networks (ConvNets). This limitation leads to rigid and non-learnable receptive field sizes, poor utilization of parameters and low flexibility of deep architectures. All of those problems are addressed by introducing the explicit compositional structure into deep neural networks, which is implemented with the proposed novel filter unit for ConvNets, termed Displaced Aggregation Unit. DAUs enable novel properties for deep models: (a) the decoupling of the parameters from the receptive field, (b) the learning of the receptive field sizes and (c) the automatic adjustment of the spatial focus of features. The benefits of DAUs are demonstrated on three practical problems: image classification, semantic segmentation and blind image de-blurring. In all cases, the inclusion of DAUs into modern architectures enables simpler networks with fewer number of operations and parameters, significantly reduces the manual modification of architectures for specific tasks and domains while it also retains or even improves the overall prediction accuracy.Doktorska disertacija obravnava dva pomembna hierarhična pristopa za modeliranje vizualnih entitet: (a) kompozicijsko hierarhijo in (b) globoke nevronske mreže. Oba pristopa sta podrobno ovrednotena skupaj z njunimi prednosti in slabosti. V kompozicijski hierarhiji je kot glavna pomanjkljivost naslovljena slaba diskriminativna moč, kar je obravnavano v prvem delu disertacije. Predlagana je nova diskriminativna značilka, imenovana Histogram Kompozicij (ang. Histogram of Compositons - HoC), ki uspešno zajame pomembne diskriminativne informacije za izboljšanje natančnosti klasifikacije. V drugem delu disertacije je v globokih konvolucijskih mrežah (ConvNet) kot pomembna pomanjkljivost izpostavljena slaba prostorska relacija med značilkami. Slednje pripelje do rigidnih in ne-učljivih velikosti dovzetnih polij, do slabe izkoriščenosti parametrov ter do nizke fleksibilnosti globokih arhitektur. Omenjeni problemi so naslovljeni z integracijo eksplicitne kompozicijske strukture v globoke nevronske mreže. V ta namen je predstavljena nova enota filtra za konvolucijske mreže, imenovana premikajoča agregacijska enota (ang. Displaced Aggregation Unit - DAU), ki omogoči vpeljavo novih lastnosti v globoke mreže: (a) neodvisnost števila parametrov od dovzetnega polja, (b) učenje velikosti dovzetnega polja in (c) samodejno prilagajanje prostorskega fokusa značilk. Prednosti filtra DAU so prikazane na treh praktičnih problemih: klasifikacija slik, semantična segmentacija slik ter razmeglejevanje slik. V vseh primerih vključitev filtra DAU v sodobne arhitekture omogoči enostavnejše globoke mreže z manjšim številom operacij in parametrov ter z manjšo potrebo po ročni modifikaciji arhitekture za specifične naloge in domene, hkrati pa ohranja ali celo izboljša klasifikacijsko točnost

    Describing visual categories by attributes

    No full text

    Opisovanje vizualnih kategorij z atributi

    Full text link
    corecore