1,591 research outputs found
Resolution learning in deep convolutional networks using scale-space theory
Resolution in deep convolutional neural networks (CNNs) is typically bounded
by the receptive field size through filter sizes, and subsampling layers or
strided convolutions on feature maps. The optimal resolution may vary
significantly depending on the dataset. Modern CNNs hard-code their resolution
hyper-parameters in the network architecture which makes tuning such
hyper-parameters cumbersome. We propose to do away with hard-coded resolution
hyper-parameters and aim to learn the appropriate resolution from data. We use
scale-space theory to obtain a self-similar parametrization of filters and make
use of the N-Jet: a truncated Taylor series to approximate a filter by a
learned combination of Gaussian derivative filters. The parameter sigma of the
Gaussian basis controls both the amount of detail the filter encodes and the
spatial extent of the filter. Since sigma is a continuous parameter, we can
optimize it with respect to the loss. The proposed N-Jet layer achieves
comparable performance when used in state-of-the art architectures, while
learning the correct resolution in each layer automatically. We evaluate our
N-Jet layer on both classification and segmentation, and we show that learning
sigma is especially beneficial for inputs at multiple sizes
Delineation of line patterns in images using B-COSFIRE filters
Delineation of line patterns in images is a basic step required in various
applications such as blood vessel detection in medical images, segmentation of
rivers or roads in aerial images, detection of cracks in walls or pavements,
etc. In this paper we present trainable B-COSFIRE filters, which are a model of
some neurons in area V1 of the primary visual cortex, and apply it to the
delineation of line patterns in different kinds of images. B-COSFIRE filters
are trainable as their selectivity is determined in an automatic configuration
process given a prototype pattern of interest. They are configurable to detect
any preferred line structure (e.g. segments, corners, cross-overs, etc.), so
usable for automatic data representation learning. We carried out experiments
on two data sets, namely a line-network data set from INRIA and a data set of
retinal fundus images named IOSTAR. The results that we achieved confirm the
robustness of the proposed approach and its effectiveness in the delineation of
line structures in different kinds of images.Comment: International Work Conference on Bioinspired Intelligence, July
10-13, 201
Image-to-Image Translation with Conditional Adversarial Networks
We investigate conditional adversarial networks as a general-purpose solution
to image-to-image translation problems. These networks not only learn the
mapping from input image to output image, but also learn a loss function to
train this mapping. This makes it possible to apply the same generic approach
to problems that traditionally would require very different loss formulations.
We demonstrate that this approach is effective at synthesizing photos from
label maps, reconstructing objects from edge maps, and colorizing images, among
other tasks. Indeed, since the release of the pix2pix software associated with
this paper, a large number of internet users (many of them artists) have posted
their own experiments with our system, further demonstrating its wide
applicability and ease of adoption without the need for parameter tweaking. As
a community, we no longer hand-engineer our mapping functions, and this work
suggests we can achieve reasonable results without hand-engineering our loss
functions either.Comment: Website: https://phillipi.github.io/pix2pix/, CVPR 201
Segmentation-guided privacy preservation in visual surveillance monitoring
Treballs Finals de Grau d'Enginyeria Informà tica, Facultat de Matemà tiques, Universitat de Barcelona, Any: 2022, Director: Sergio Escalera Guerrero, Zenjie Li i Kamal Nasrollahi[en] Video surveillance has become a necessity to ensure safety and security. Today, with the advancement
of technology, video surveillance has become more accessible and widely available. Furthermore, it can be useful
in an enormous amount of applications and situations. For instance, it can be useful in ensuring public safety by
preventing vandalism, robbery, and shoplifting. The same applies to more intimate situations, like home monitoring to detect unusual behavior of residents or in similar situations like hospitals and assisted living facilities. Thus, cameras are installed in public places like malls, metro stations, and on-roads for traffic control, as well as in sensitive settings like hospitals, embassies, and private homes. Video surveillance has always been as-
sociated with the loss of privacy. Therefore, we developed a real-time visualization of privacy-protected video
surveillance data by applying a segmentation mask to protect privacy while still being able to identify existing
risk behaviors. This replaces existing privacy safeguards such as blanking, masking, pixelation, blurring, and
scrambling. As we want to protect human personal data that are visual such as appearance, physical information, clothing, skin, eye and hair color, and facial gestures. Our main aim of this work is to analyze and compare the most successful deep-learning-based state-of-the-art approaches for semantic segmentation. In this study, we
perform an efficiency-accuracy comparison to determine which segmentation methods yield accurate segmentation results while performing at the speed and execution required for real-life application scenarios. Furthermore, we also provide a modified dataset made from a combination of three existing datasets, COCO_stuff164K, PASCAL VOC 2012, and ADE20K, to make our comparison fair and generate privacyprotecting human segmentation masks
- …