32,021 research outputs found
Collaborative Receptive Field Learning
The challenge of object categorization in images is largely due to arbitrary
translations and scales of the foreground objects. To attack this difficulty,
we propose a new approach called collaborative receptive field learning to
extract specific receptive fields (RF's) or regions from multiple images, and
the selected RF's are supposed to focus on the foreground objects of a common
category. To this end, we solve the problem by maximizing a submodular function
over a similarity graph constructed by a pool of RF candidates. However,
measuring pairwise distance of RF's for building the similarity graph is a
nontrivial problem. Hence, we introduce a similarity metric called
pyramid-error distance (PED) to measure their pairwise distances through
summing up pyramid-like matching errors over a set of low-level features.
Besides, in consistent with the proposed PED, we construct a simple
nonparametric classifier for classification. Experimental results show that our
method effectively discovers the foreground objects in images, and improves
classification performance.Comment: 16 pages, 8 figure
Dilated Point Convolutions: On the Receptive Field Size of Point Convolutions on 3D Point Clouds
In this work, we propose Dilated Point Convolutions (DPC). In a thorough
ablation study, we show that the receptive field size is directly related to
the performance of 3D point cloud processing tasks, including semantic
segmentation and object classification. Point convolutions are widely used to
efficiently process 3D data representations such as point clouds or graphs.
However, we observe that the receptive field size of recent point convolutional
networks is inherently limited. Our dilated point convolutions alleviate this
issue, they significantly increase the receptive field size of point
convolutions. Importantly, our dilation mechanism can easily be integrated into
most existing point convolutional networks. To evaluate the resulting network
architectures, we visualize the receptive field and report competitive scores
on popular point cloud benchmarks.Comment: ICRA 2020 Video https://www.youtube.com/watch?v=JDfFmuOvMkM Project
https://francisengelmann.github.io/DPC
Minimal Embedding Dimensions of Connected Neural Codes
In the past few years, the study of receptive field codes has been of large
interest to mathematicians. Here we give a complete characterization of
receptive field codes realizable by connected receptive fields and we give the
minimal embedding dimensions of these codes. In particular, we show that all
connected codes are realizable in dimension at most 3. To our knowledge, this
is the first family of receptive field codes for which the exact
characterization and minimal embedding dimension is known.Comment: 9 pages, 4 figure
Depth Adaptive Deep Neural Network for Semantic Segmentation
In this work, we present the depth-adaptive deep neural network using a depth
map for semantic segmentation. Typical deep neural networks receive inputs at
the predetermined locations regardless of the distance from the camera. This
fixed receptive field presents a challenge to generalize the features of
objects at various distances in neural networks. Specifically, the
predetermined receptive fields are too small at a short distance, and vice
versa. To overcome this challenge, we develop a neural network which is able to
adapt the receptive field not only for each layer but also for each neuron at
the spatial location. To adjust the receptive field, we propose the
depth-adaptive multiscale (DaM) convolution layer consisting of the adaptive
perception neuron and the in-layer multiscale neuron. The adaptive perception
neuron is to adjust the receptive field at each spatial location using the
corresponding depth information. The in-layer multiscale neuron is to apply the
different size of the receptive field at each feature space to learn features
at multiple scales. The proposed DaM convolution is applied to two fully
convolutional neural networks. We demonstrate the effectiveness of the proposed
neural networks on the publicly available RGB-D dataset for semantic
segmentation and the novel hand segmentation dataset for hand-object
interaction. The experimental results show that the proposed method outperforms
the state-of-the-art methods without any additional layers or
pre/post-processing.Comment: IEEE Transactions on Multimedia, 201
Stochastic Training of Graph Convolutional Networks with Variance Reduction
Graph convolutional networks (GCNs) are powerful deep neural networks for
graph-structured data. However, GCN computes the representation of a node
recursively from its neighbors, making the receptive field size grow
exponentially with the number of layers. Previous attempts on reducing the
receptive field size by subsampling neighbors do not have a convergence
guarantee, and their receptive field size per node is still in the order of
hundreds. In this paper, we develop control variate based algorithms which
allow sampling an arbitrarily small neighbor size. Furthermore, we prove new
theoretical guarantee for our algorithms to converge to a local optimum of GCN.
Empirical results show that our algorithms enjoy a similar convergence with the
exact algorithm using only two neighbors per node. The runtime of our
algorithms on a large Reddit dataset is only one seventh of previous neighbor
sampling algorithms
Extraclassical receptive field phenomena & short-range connectivity in V1
Neural mechanisms of extraclassical receptive field phenomena in V1 are
commonly assumed to result from long-range lateral connections and/or
extrastriate feedback. We address two such phenomena: surround suppression and
contrast dependent receptive field size. We present rigorous computational
support for the hypothesis that the phenomena largely result from local
short-range (< 0.5 mm) cortical connections and LGN input. Surround suppression
in our simulations results from (A) direct cortical inhibition or (B)
suppression of recurrent cortical excitation, or (C) action of both these
mechanisms simultaneously. Mechanisms B and C are substantially more prevalent
than A. We observe an average growth in the range of spatial summation of
excitatory and inhibitory synaptic inputs for low contrast. However, we find
this is neither sufficient nor necessary to explain contrast dependent
receptive field size, which usually involves additional changes in the relative
gain of these inputs
Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields
The visual world is vast and varied, but its variations divide into
structured and unstructured factors. We compose free-form filters and
structured Gaussian filters, optimized end-to-end, to factorize deep
representations and learn both local features and their degree of locality. Our
semi-structured composition is strictly more expressive than free-form
filtering, and changes in its structured parameters would require changes in
free-form architecture. In effect this optimizes over receptive field size and
shape, tuning locality to the data and task. Dynamic inference, in which the
Gaussian structure varies with the input, adapts receptive field size to
compensate for local scale variation. Optimizing receptive field size improves
semantic segmentation accuracy on Cityscapes by 1-2 points for strong dilated
and skip architectures and by up to 10 points for suboptimal designs. Adapting
receptive fields by dynamic Gaussian structure further improves results,
equaling the accuracy of free-form deformation while improving efficiency
Role of zero synapses in unsupervised feature learning
Synapses in real neural circuits can take discrete values, including zero
(silent or potential) synapses. The computational role of zero synapses in
unsupervised feature learning of unlabeled noisy data is still unclear, thus it
is important to understand how the sparseness of synaptic activity is shaped
during learning and its relationship with receptive field formation. Here, we
formulate this kind of sparse feature learning by a statistical mechanics
approach. We find that learning decreases the fraction of zero synapses, and
when the fraction decreases rapidly around a critical data size, an
intrinsically structured receptive field starts to develop. Further increasing
the data size refines the receptive field, while a very small fraction of zero
synapses remain to act as contour detectors. This phenomenon is discovered not
only in learning a handwritten digits dataset, but also in learning retinal
neural activity measured in a natural-movie-stimuli experiment.Comment: 6 pages, 4 figures, to appear in J. Phys A as a LETTE
ACNN: a Full Resolution DCNN for Medical Image Segmentation
Deep Convolutional Neural Networks (DCNNs) are used extensively in medical
image segmentation and hence 3D navigation for robot-assisted Minimally
Invasive Surgeries (MISs). However, current DCNNs usually use down sampling
layers for increasing the receptive field and gaining abstract semantic
information. These down sampling layers decrease the spatial dimension of
feature maps, which can be detrimental to image segmentation. Atrous
convolution is an alternative for the down sampling layer. It increases the
receptive field whilst maintains the spatial dimension of feature maps. In this
paper, a method for effective atrous rate setting is proposed to achieve the
largest and fully-covered receptive field with a minimum number of atrous
convolutional layers. Furthermore, a new and full resolution DCNN - Atrous
Convolutional Neural Network (ACNN), which incorporates cascaded atrous
II-blocks, residual learning and Instance Normalization (IN) is proposed.
Application results of the proposed ACNN to Magnetic Resonance Imaging (MRI)
and Computed Tomography (CT) image segmentation demonstrate that the proposed
ACNN can achieve higher segmentation Intersection over Unions (IoUs) than U-Net
and Deeplabv3+, but with reduced trainable parameters.Comment: 7 pages, 2 tables, 6 figure
Perceptual Extreme Super Resolution Network with Receptive Field Block
Perceptual Extreme Super-Resolution for single image is extremely difficult,
because the texture details of different images vary greatly. To tackle this
difficulty, we develop a super resolution network with receptive field block
based on Enhanced SRGAN. We call our network RFB-ESRGAN. The key contributions
are listed as follows. First, for the purpose of extracting multi-scale
information and enhance the feature discriminability, we applied receptive
field block (RFB) to super resolution. RFB has achieved competitive results in
object detection and classification. Second, instead of using large convolution
kernels in multi-scale receptive field block, several small kernels are used in
RFB, which makes us be able to extract detailed features and reduce the
computation complexity. Third, we alternately use different upsampling methods
in the upsampling stage to reduce the high computation complexity and still
remain satisfactory performance. Fourth, we use the ensemble of 10 models of
different iteration to improve the robustness of model and reduce the noise
introduced by each individual model. Our experimental results show the superior
performance of RFB-ESRGAN. According to the preliminary results of NTIRE 2020
Perceptual Extreme Super-Resolution Challenge, our solution ranks first among
all the participants.Comment: CVPRW 2020 accepted oral, 8 pages,45 figure
- …