32,021 research outputs found

    Collaborative Receptive Field Learning

    Full text link
    The challenge of object categorization in images is largely due to arbitrary translations and scales of the foreground objects. To attack this difficulty, we propose a new approach called collaborative receptive field learning to extract specific receptive fields (RF's) or regions from multiple images, and the selected RF's are supposed to focus on the foreground objects of a common category. To this end, we solve the problem by maximizing a submodular function over a similarity graph constructed by a pool of RF candidates. However, measuring pairwise distance of RF's for building the similarity graph is a nontrivial problem. Hence, we introduce a similarity metric called pyramid-error distance (PED) to measure their pairwise distances through summing up pyramid-like matching errors over a set of low-level features. Besides, in consistent with the proposed PED, we construct a simple nonparametric classifier for classification. Experimental results show that our method effectively discovers the foreground objects in images, and improves classification performance.Comment: 16 pages, 8 figure

    Dilated Point Convolutions: On the Receptive Field Size of Point Convolutions on 3D Point Clouds

    Full text link
    In this work, we propose Dilated Point Convolutions (DPC). In a thorough ablation study, we show that the receptive field size is directly related to the performance of 3D point cloud processing tasks, including semantic segmentation and object classification. Point convolutions are widely used to efficiently process 3D data representations such as point clouds or graphs. However, we observe that the receptive field size of recent point convolutional networks is inherently limited. Our dilated point convolutions alleviate this issue, they significantly increase the receptive field size of point convolutions. Importantly, our dilation mechanism can easily be integrated into most existing point convolutional networks. To evaluate the resulting network architectures, we visualize the receptive field and report competitive scores on popular point cloud benchmarks.Comment: ICRA 2020 Video https://www.youtube.com/watch?v=JDfFmuOvMkM Project https://francisengelmann.github.io/DPC

    Minimal Embedding Dimensions of Connected Neural Codes

    Full text link
    In the past few years, the study of receptive field codes has been of large interest to mathematicians. Here we give a complete characterization of receptive field codes realizable by connected receptive fields and we give the minimal embedding dimensions of these codes. In particular, we show that all connected codes are realizable in dimension at most 3. To our knowledge, this is the first family of receptive field codes for which the exact characterization and minimal embedding dimension is known.Comment: 9 pages, 4 figure

    Depth Adaptive Deep Neural Network for Semantic Segmentation

    Full text link
    In this work, we present the depth-adaptive deep neural network using a depth map for semantic segmentation. Typical deep neural networks receive inputs at the predetermined locations regardless of the distance from the camera. This fixed receptive field presents a challenge to generalize the features of objects at various distances in neural networks. Specifically, the predetermined receptive fields are too small at a short distance, and vice versa. To overcome this challenge, we develop a neural network which is able to adapt the receptive field not only for each layer but also for each neuron at the spatial location. To adjust the receptive field, we propose the depth-adaptive multiscale (DaM) convolution layer consisting of the adaptive perception neuron and the in-layer multiscale neuron. The adaptive perception neuron is to adjust the receptive field at each spatial location using the corresponding depth information. The in-layer multiscale neuron is to apply the different size of the receptive field at each feature space to learn features at multiple scales. The proposed DaM convolution is applied to two fully convolutional neural networks. We demonstrate the effectiveness of the proposed neural networks on the publicly available RGB-D dataset for semantic segmentation and the novel hand segmentation dataset for hand-object interaction. The experimental results show that the proposed method outperforms the state-of-the-art methods without any additional layers or pre/post-processing.Comment: IEEE Transactions on Multimedia, 201

    Stochastic Training of Graph Convolutional Networks with Variance Reduction

    Full text link
    Graph convolutional networks (GCNs) are powerful deep neural networks for graph-structured data. However, GCN computes the representation of a node recursively from its neighbors, making the receptive field size grow exponentially with the number of layers. Previous attempts on reducing the receptive field size by subsampling neighbors do not have a convergence guarantee, and their receptive field size per node is still in the order of hundreds. In this paper, we develop control variate based algorithms which allow sampling an arbitrarily small neighbor size. Furthermore, we prove new theoretical guarantee for our algorithms to converge to a local optimum of GCN. Empirical results show that our algorithms enjoy a similar convergence with the exact algorithm using only two neighbors per node. The runtime of our algorithms on a large Reddit dataset is only one seventh of previous neighbor sampling algorithms

    Extraclassical receptive field phenomena & short-range connectivity in V1

    Full text link
    Neural mechanisms of extraclassical receptive field phenomena in V1 are commonly assumed to result from long-range lateral connections and/or extrastriate feedback. We address two such phenomena: surround suppression and contrast dependent receptive field size. We present rigorous computational support for the hypothesis that the phenomena largely result from local short-range (< 0.5 mm) cortical connections and LGN input. Surround suppression in our simulations results from (A) direct cortical inhibition or (B) suppression of recurrent cortical excitation, or (C) action of both these mechanisms simultaneously. Mechanisms B and C are substantially more prevalent than A. We observe an average growth in the range of spatial summation of excitatory and inhibitory synaptic inputs for low contrast. However, we find this is neither sufficient nor necessary to explain contrast dependent receptive field size, which usually involves additional changes in the relative gain of these inputs

    Blurring the Line Between Structure and Learning to Optimize and Adapt Receptive Fields

    Full text link
    The visual world is vast and varied, but its variations divide into structured and unstructured factors. We compose free-form filters and structured Gaussian filters, optimized end-to-end, to factorize deep representations and learn both local features and their degree of locality. Our semi-structured composition is strictly more expressive than free-form filtering, and changes in its structured parameters would require changes in free-form architecture. In effect this optimizes over receptive field size and shape, tuning locality to the data and task. Dynamic inference, in which the Gaussian structure varies with the input, adapts receptive field size to compensate for local scale variation. Optimizing receptive field size improves semantic segmentation accuracy on Cityscapes by 1-2 points for strong dilated and skip architectures and by up to 10 points for suboptimal designs. Adapting receptive fields by dynamic Gaussian structure further improves results, equaling the accuracy of free-form deformation while improving efficiency

    Role of zero synapses in unsupervised feature learning

    Full text link
    Synapses in real neural circuits can take discrete values, including zero (silent or potential) synapses. The computational role of zero synapses in unsupervised feature learning of unlabeled noisy data is still unclear, thus it is important to understand how the sparseness of synaptic activity is shaped during learning and its relationship with receptive field formation. Here, we formulate this kind of sparse feature learning by a statistical mechanics approach. We find that learning decreases the fraction of zero synapses, and when the fraction decreases rapidly around a critical data size, an intrinsically structured receptive field starts to develop. Further increasing the data size refines the receptive field, while a very small fraction of zero synapses remain to act as contour detectors. This phenomenon is discovered not only in learning a handwritten digits dataset, but also in learning retinal neural activity measured in a natural-movie-stimuli experiment.Comment: 6 pages, 4 figures, to appear in J. Phys A as a LETTE

    ACNN: a Full Resolution DCNN for Medical Image Segmentation

    Full text link
    Deep Convolutional Neural Networks (DCNNs) are used extensively in medical image segmentation and hence 3D navigation for robot-assisted Minimally Invasive Surgeries (MISs). However, current DCNNs usually use down sampling layers for increasing the receptive field and gaining abstract semantic information. These down sampling layers decrease the spatial dimension of feature maps, which can be detrimental to image segmentation. Atrous convolution is an alternative for the down sampling layer. It increases the receptive field whilst maintains the spatial dimension of feature maps. In this paper, a method for effective atrous rate setting is proposed to achieve the largest and fully-covered receptive field with a minimum number of atrous convolutional layers. Furthermore, a new and full resolution DCNN - Atrous Convolutional Neural Network (ACNN), which incorporates cascaded atrous II-blocks, residual learning and Instance Normalization (IN) is proposed. Application results of the proposed ACNN to Magnetic Resonance Imaging (MRI) and Computed Tomography (CT) image segmentation demonstrate that the proposed ACNN can achieve higher segmentation Intersection over Unions (IoUs) than U-Net and Deeplabv3+, but with reduced trainable parameters.Comment: 7 pages, 2 tables, 6 figure

    Perceptual Extreme Super Resolution Network with Receptive Field Block

    Full text link
    Perceptual Extreme Super-Resolution for single image is extremely difficult, because the texture details of different images vary greatly. To tackle this difficulty, we develop a super resolution network with receptive field block based on Enhanced SRGAN. We call our network RFB-ESRGAN. The key contributions are listed as follows. First, for the purpose of extracting multi-scale information and enhance the feature discriminability, we applied receptive field block (RFB) to super resolution. RFB has achieved competitive results in object detection and classification. Second, instead of using large convolution kernels in multi-scale receptive field block, several small kernels are used in RFB, which makes us be able to extract detailed features and reduce the computation complexity. Third, we alternately use different upsampling methods in the upsampling stage to reduce the high computation complexity and still remain satisfactory performance. Fourth, we use the ensemble of 10 models of different iteration to improve the robustness of model and reduce the noise introduced by each individual model. Our experimental results show the superior performance of RFB-ESRGAN. According to the preliminary results of NTIRE 2020 Perceptual Extreme Super-Resolution Challenge, our solution ranks first among all the participants.Comment: CVPRW 2020 accepted oral, 8 pages,45 figure
    • …
    corecore