5,221 research outputs found
An Adaptive Locally Connected Neuron Model: Focusing Neuron
This paper presents a new artificial neuron model capable of learning its
receptive field in the topological domain of inputs. The model provides
adaptive and differentiable local connectivity (plasticity) applicable to any
domain. It requires no other tool than the backpropagation algorithm to learn
its parameters which control the receptive field locations and apertures. This
research explores whether this ability makes the neuron focus on informative
inputs and yields any advantage over fully connected neurons. The experiments
include tests of focusing neuron networks of one or two hidden layers on
synthetic and well-known image recognition data sets. The results demonstrated
that the focusing neurons can move their receptive fields towards more
informative inputs. In the simple two-hidden layer networks, the focusing
layers outperformed the dense layers in the classification of the 2D spatial
data sets. Moreover, the focusing networks performed better than the dense
networks even when 70 of the weights were pruned. The tests on
convolutional networks revealed that using focusing layers instead of dense
layers for the classification of convolutional features may work better in some
data sets.Comment: 45 pages, a national patent filed, submitted to Turkish Patent
Office, No: -2017/17601, Date: 09.11.201
A Taxonomy of Deep Convolutional Neural Nets for Computer Vision
Traditional architectures for solving computer vision problems and the degree
of success they enjoyed have been heavily reliant on hand-crafted features.
However, of late, deep learning techniques have offered a compelling
alternative -- that of automatically learning problem-specific features. With
this new paradigm, every problem in computer vision is now being re-examined
from a deep learning perspective. Therefore, it has become important to
understand what kind of deep networks are suitable for a given problem.
Although general surveys of this fast-moving paradigm (i.e. deep-networks)
exist, a survey specific to computer vision is missing. We specifically
consider one form of deep networks widely used in computer vision -
convolutional neural networks (CNNs). We start with "AlexNet" as our base CNN
and then examine the broad variations proposed over time to suit different
applications. We hope that our recipe-style survey will serve as a guide,
particularly for novice practitioners intending to use deep-learning techniques
for computer vision.Comment: Published in Frontiers in Robotics and AI (http://goo.gl/6691Bm
Precise Single-stage Detector
There are still two problems in SDD causing some inaccurate results: (1) In
the process of feature extraction, with the layer-by-layer acquisition of
semantic information, local information is gradually lost, resulting into less
representative feature maps; (2) During the Non-Maximum Suppression (NMS)
algorithm due to inconsistency in classification and regression tasks, the
classification confidence and predicted detection position cannot accurately
indicate the position of the prediction boxes. Methods: In order to address
these aforementioned issues, we propose a new architecture, a modified version
of Single Shot Multibox Detector (SSD), named Precise Single Stage Detector
(PSSD). Firstly, we improve the features by adding extra layers to SSD.
Secondly, we construct a simple and effective feature enhancement module to
expand the receptive field step by step for each layer and enhance its local
and semantic information. Finally, we design a more efficient loss function to
predict the IOU between the prediction boxes and ground truth boxes, and the
threshold IOU guides classification training and attenuates the scores, which
are used by the NMS algorithm. Main Results: Benefiting from the above
optimization, the proposed model PSSD achieves exciting performance in
real-time. Specifically, with the hardware of Titan Xp and the input size of
320 pix, PSSD achieves 33.8 mAP at 45 FPS speed on MS COCO benchmark and 81.28
mAP at 66 FPS speed on Pascal VOC 2007 outperforming state-of-the-art object
detection models. Besides, the proposed model performs significantly well with
larger input size. Under 512 pix, PSSD can obtain 37.2 mAP with 27 FPS on MS
COCO and 82.82 mAP with 40 FPS on Pascal VOC 2007. The experiment results prove
that the proposed model has a better trade-off between speed and accuracy.Comment: We will submit it soon to the IEEE transaction. Due to characters
limitation, we can not upload the full abstract. Please read the pdf file for
more detai
Dynamic texture recognition using time-causal and time-recursive spatio-temporal receptive fields
This work presents a first evaluation of using spatio-temporal receptive
fields from a recently proposed time-causal spatio-temporal scale-space
framework as primitives for video analysis. We propose a new family of video
descriptors based on regional statistics of spatio-temporal receptive field
responses and evaluate this approach on the problem of dynamic texture
recognition. Our approach generalises a previously used method, based on joint
histograms of receptive field responses, from the spatial to the
spatio-temporal domain and from object recognition to dynamic texture
recognition. The time-recursive formulation enables computationally efficient
time-causal recognition. The experimental evaluation demonstrates competitive
performance compared to state-of-the-art. Especially, it is shown that binary
versions of our dynamic texture descriptors achieve improved performance
compared to a large range of similar methods using different primitives either
handcrafted or learned from data. Further, our qualitative and quantitative
investigation into parameter choices and the use of different sets of receptive
fields highlights the robustness and flexibility of our approach. Together,
these results support the descriptive power of this family of time-causal
spatio-temporal receptive fields, validate our approach for dynamic texture
recognition and point towards the possibility of designing a range of video
analysis methods based on these new time-causal spatio-temporal primitives.Comment: 29 pages, 16 figure
- …