76,725 research outputs found
Invariance of visual operations at the level of receptive fields
Receptive field profiles registered by cell recordings have shown that
mammalian vision has developed receptive fields tuned to different sizes and
orientations in the image domain as well as to different image velocities in
space-time. This article presents a theoretical model by which families of
idealized receptive field profiles can be derived mathematically from a small
set of basic assumptions that correspond to structural properties of the
environment. The article also presents a theory for how basic invariance
properties to variations in scale, viewing direction and relative motion can be
obtained from the output of such receptive fields, using complementary
selection mechanisms that operate over the output of families of receptive
fields tuned to different parameters. Thereby, the theory shows how basic
invariance properties of a visual system can be obtained already at the level
of receptive fields, and we can explain the different shapes of receptive field
profiles found in biological vision from a requirement that the visual system
should be invariant to the natural types of image transformations that occur in
its environment.Comment: 40 pages, 17 figure
Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks
Recognizing arbitrary multi-character text in unconstrained natural
photographs is a hard problem. In this paper, we address an equally hard
sub-problem in this domain viz. recognizing arbitrary multi-digit numbers from
Street View imagery. Traditional approaches to solve this problem typically
separate out the localization, segmentation, and recognition steps. In this
paper we propose a unified approach that integrates these three steps via the
use of a deep convolutional neural network that operates directly on the image
pixels. We employ the DistBelief implementation of deep neural networks in
order to train large, distributed neural networks on high quality images. We
find that the performance of this approach increases with the depth of the
convolutional network, with the best performance occurring in the deepest
architecture we trained, with eleven hidden layers. We evaluate this approach
on the publicly available SVHN dataset and achieve over accuracy in
recognizing complete street numbers. We show that on a per-digit recognition
task, we improve upon the state-of-the-art, achieving accuracy. We
also evaluate this approach on an even more challenging dataset generated from
Street View imagery containing several tens of millions of street number
annotations and achieve over accuracy. To further explore the
applicability of the proposed system to broader text recognition tasks, we
apply it to synthetic distorted text from reCAPTCHA. reCAPTCHA is one of the
most secure reverse turing tests that uses distorted text to distinguish humans
from bots. We report a accuracy on the hardest category of reCAPTCHA.
Our evaluations on both tasks indicate that at specific operating thresholds,
the performance of the proposed system is comparable to, and in some cases
exceeds, that of human operators
Content-based Video Retrieval by Integrating Spatio-Temporal and Stochastic Recognition of Events
As amounts of publicly available video data grow the need to query this data efficiently becomes significant. Consequently content-based retrieval of video data turns out to be a challenging and important problem. We address the specific aspect of inferring semantics automatically from raw video data. In particular, we introduce a new video data model that supports the integrated use of two different approaches for mapping low-level features to high-level concepts. Firstly, the model is extended with a rule-based approach that supports spatio-temporal formalization of high-level concepts, and then with a stochastic approach. Furthermore, results on real tennis video data are presented, demonstrating the validity of both approaches, as well us advantages of their integrated us
Precision Learning: Towards Use of Known Operators in Neural Networks
In this paper, we consider the use of prior knowledge within neural networks.
In particular, we investigate the effect of a known transform within the
mapping from input data space to the output domain. We demonstrate that use of
known transforms is able to change maximal error bounds.
In order to explore the effect further, we consider the problem of X-ray
material decomposition as an example to incorporate additional prior knowledge.
We demonstrate that inclusion of a non-linear function known from the physical
properties of the system is able to reduce prediction errors therewith
improving prediction quality from SSIM values of 0.54 to 0.88.
This approach is applicable to a wide set of applications in physics and
signal processing that provide prior knowledge on such transforms. Also maximal
error estimation and network understanding could be facilitated within the
context of precision learning.Comment: accepted on ICPR 201
Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking
With efficient appearance learning models, Discriminative Correlation Filter
(DCF) has been proven to be very successful in recent video object tracking
benchmarks and competitions. However, the existing DCF paradigm suffers from
two major issues, i.e., spatial boundary effect and temporal filter
degradation. To mitigate these challenges, we propose a new DCF-based tracking
method. The key innovations of the proposed method include adaptive spatial
feature selection and temporal consistent constraints, with which the new
tracker enables joint spatial-temporal filter learning in a lower dimensional
discriminative manifold. More specifically, we apply structured spatial
sparsity constraints to multi-channel filers. Consequently, the process of
learning spatial filters can be approximated by the lasso regularisation. To
encourage temporal consistency, the filter model is restricted to lie around
its historical value and updated locally to preserve the global structure in
the manifold. Last, a unified optimisation framework is proposed to jointly
select temporal consistency preserving spatial features and learn
discriminative filters with the augmented Lagrangian method. Qualitative and
quantitative evaluations have been conducted on a number of well-known
benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123 and
VOT2018. The experimental results demonstrate the superiority of the proposed
method over the state-of-the-art approaches
- …