1,591 research outputs found

    Resolution learning in deep convolutional networks using scale-space theory

    Full text link
    Resolution in deep convolutional neural networks (CNNs) is typically bounded by the receptive field size through filter sizes, and subsampling layers or strided convolutions on feature maps. The optimal resolution may vary significantly depending on the dataset. Modern CNNs hard-code their resolution hyper-parameters in the network architecture which makes tuning such hyper-parameters cumbersome. We propose to do away with hard-coded resolution hyper-parameters and aim to learn the appropriate resolution from data. We use scale-space theory to obtain a self-similar parametrization of filters and make use of the N-Jet: a truncated Taylor series to approximate a filter by a learned combination of Gaussian derivative filters. The parameter sigma of the Gaussian basis controls both the amount of detail the filter encodes and the spatial extent of the filter. Since sigma is a continuous parameter, we can optimize it with respect to the loss. The proposed N-Jet layer achieves comparable performance when used in state-of-the art architectures, while learning the correct resolution in each layer automatically. We evaluate our N-Jet layer on both classification and segmentation, and we show that learning sigma is especially beneficial for inputs at multiple sizes

    Delineation of line patterns in images using B-COSFIRE filters

    Get PDF
    Delineation of line patterns in images is a basic step required in various applications such as blood vessel detection in medical images, segmentation of rivers or roads in aerial images, detection of cracks in walls or pavements, etc. In this paper we present trainable B-COSFIRE filters, which are a model of some neurons in area V1 of the primary visual cortex, and apply it to the delineation of line patterns in different kinds of images. B-COSFIRE filters are trainable as their selectivity is determined in an automatic configuration process given a prototype pattern of interest. They are configurable to detect any preferred line structure (e.g. segments, corners, cross-overs, etc.), so usable for automatic data representation learning. We carried out experiments on two data sets, namely a line-network data set from INRIA and a data set of retinal fundus images named IOSTAR. The results that we achieved confirm the robustness of the proposed approach and its effectiveness in the delineation of line structures in different kinds of images.Comment: International Work Conference on Bioinspired Intelligence, July 10-13, 201

    Image-to-Image Translation with Conditional Adversarial Networks

    Full text link
    We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These networks not only learn the mapping from input image to output image, but also learn a loss function to train this mapping. This makes it possible to apply the same generic approach to problems that traditionally would require very different loss formulations. We demonstrate that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks. Indeed, since the release of the pix2pix software associated with this paper, a large number of internet users (many of them artists) have posted their own experiments with our system, further demonstrating its wide applicability and ease of adoption without the need for parameter tweaking. As a community, we no longer hand-engineer our mapping functions, and this work suggests we can achieve reasonable results without hand-engineering our loss functions either.Comment: Website: https://phillipi.github.io/pix2pix/, CVPR 201

    Segmentation-guided privacy preservation in visual surveillance monitoring

    Full text link
    Treballs Finals de Grau d'Enginyeria Informàtica, Facultat de Matemàtiques, Universitat de Barcelona, Any: 2022, Director: Sergio Escalera Guerrero, Zenjie Li i Kamal Nasrollahi[en] Video surveillance has become a necessity to ensure safety and security. Today, with the advancement of technology, video surveillance has become more accessible and widely available. Furthermore, it can be useful in an enormous amount of applications and situations. For instance, it can be useful in ensuring public safety by preventing vandalism, robbery, and shoplifting. The same applies to more intimate situations, like home monitoring to detect unusual behavior of residents or in similar situations like hospitals and assisted living facilities. Thus, cameras are installed in public places like malls, metro stations, and on-roads for traffic control, as well as in sensitive settings like hospitals, embassies, and private homes. Video surveillance has always been as- sociated with the loss of privacy. Therefore, we developed a real-time visualization of privacy-protected video surveillance data by applying a segmentation mask to protect privacy while still being able to identify existing risk behaviors. This replaces existing privacy safeguards such as blanking, masking, pixelation, blurring, and scrambling. As we want to protect human personal data that are visual such as appearance, physical information, clothing, skin, eye and hair color, and facial gestures. Our main aim of this work is to analyze and compare the most successful deep-learning-based state-of-the-art approaches for semantic segmentation. In this study, we perform an efficiency-accuracy comparison to determine which segmentation methods yield accurate segmentation results while performing at the speed and execution required for real-life application scenarios. Furthermore, we also provide a modified dataset made from a combination of three existing datasets, COCO_stuff164K, PASCAL VOC 2012, and ADE20K, to make our comparison fair and generate privacyprotecting human segmentation masks
    • …
    corecore