67,901 research outputs found
On the Importance of Visual Context for Data Augmentation in Scene Understanding
Performing data augmentation for learning deep neural networks is known to be
important for training visual recognition systems. By artificially increasing
the number of training examples, it helps reducing overfitting and improves
generalization. While simple image transformations can already improve
predictive performance in most vision tasks, larger gains can be obtained by
leveraging task-specific prior knowledge. In this work, we consider object
detection, semantic and instance segmentation and augment the training images
by blending objects in existing scenes, using instance segmentation
annotations. We observe that randomly pasting objects on images hurts the
performance, unless the object is placed in the right context. To resolve this
issue, we propose an explicit context model by using a convolutional neural
network, which predicts whether an image region is suitable for placing a given
object or not. In our experiments, we show that our approach is able to improve
object detection, semantic and instance segmentation on the PASCAL VOC12 and
COCO datasets, with significant gains in a limited annotation scenario, i.e.
when only one category is annotated. We also show that the method is not
limited to datasets that come with expensive pixel-wise instance annotations
and can be used when only bounding boxes are available, by employing
weakly-supervised learning for instance masks approximation.Comment: Updated the experimental section. arXiv admin note: substantial text
overlap with arXiv:1807.0742
Deep Object-Centric Representations for Generalizable Robot Learning
Robotic manipulation in complex open-world scenarios requires both reliable
physical manipulation skills and effective and generalizable perception. In
this paper, we propose a method where general purpose pretrained visual models
serve as an object-centric prior for the perception system of a learned policy.
We devise an object-level attentional mechanism that can be used to determine
relevant objects from a few trajectories or demonstrations, and then
immediately incorporate those objects into a learned policy. A task-independent
meta-attention locates possible objects in the scene, and a task-specific
attention identifies which objects are predictive of the trajectories. The
scope of the task-specific attention is easily adjusted by showing
demonstrations with distractor objects or with diverse relevant objects. Our
results indicate that this approach exhibits good generalization across object
instances using very few samples, and can be used to learn a variety of
manipulation tasks using reinforcement learning
Gaussian Processes with Context-Supported Priors for Active Object Localization
We devise an algorithm using a Bayesian optimization framework in conjunction
with contextual visual data for the efficient localization of objects in still
images. Recent research has demonstrated substantial progress in object
localization and related tasks for computer vision. However, many current
state-of-the-art object localization procedures still suffer from inaccuracy
and inefficiency, in addition to failing to provide a principled and
interpretable system amenable to high-level vision tasks. We address these
issues with the current research.
Our method encompasses an active search procedure that uses contextual data
to generate initial bounding-box proposals for a target object. We train a
convolutional neural network to approximate an offset distance from the target
object. Next, we use a Gaussian Process to model this offset response signal
over the search space of the target. We then employ a Bayesian active search
for accurate localization of the target.
In experiments, we compare our approach to a state-of-theart bounding-box
regression method for a challenging pedestrian localization task. Our method
exhibits a substantial improvement over this baseline regression method.Comment: 10 pages, 4 figure
Event Prediction and Object Motion Estimation in the Development of Visual Attention
A model of gaze control is describes that includes mechanisms for predictive control using a forward model and event driven expectations of target behavior. The model roughly undergoes stages similar to those of human infants if the influence of the predictive systems is gradually increased
Identification of Invariant Sensorimotor Structures as a Prerequisite for the Discovery of Objects
Perceiving the surrounding environment in terms of objects is useful for any
general purpose intelligent agent. In this paper, we investigate a fundamental
mechanism making object perception possible, namely the identification of
spatio-temporally invariant structures in the sensorimotor experience of an
agent. We take inspiration from the Sensorimotor Contingencies Theory to define
a computational model of this mechanism through a sensorimotor, unsupervised
and predictive approach. Our model is based on processing the unsupervised
interaction of an artificial agent with its environment. We show how
spatio-temporally invariant structures in the environment induce regularities
in the sensorimotor experience of an agent, and how this agent, while building
a predictive model of its sensorimotor experience, can capture them as densely
connected subgraphs in a graph of sensory states connected by motor commands.
Our approach is focused on elementary mechanisms, and is illustrated with a set
of simple experiments in which an agent interacts with an environment. We show
how the agent can build an internal model of moving but spatio-temporally
invariant structures by performing a Spectral Clustering of the graph modeling
its overall sensorimotor experiences. We systematically examine properties of
the model, shedding light more globally on the specificities of the paradigm
with respect to methods based on the supervised processing of collections of
static images.Comment: 24 pages, 10 figures, published in Frontiers Robotics and A
Trajectory recognition as the basis for object individuation: A functional model of object file instantiation and object token encoding
The perception of persisting visual objects is mediated by transient intermediate representations, object files, that are instantiated in response to some, but not all, visual trajectories. The standard object file concept does not, however, provide a mechanism sufficient to account for all experimental data on visual object persistence, object tracking, and the ability to perceive spatially-disconnected stimuli as coherent objects. Based on relevant anatomical, functional, and developmental data, a functional model is developed that bases object individuation on the specific recognition of visual trajectories. This model is shown to account for a wide range of data, and to generate a variety of testable predictions. Individual variations of the model parameters are expected to generate distinct trajectory and object recognition abilities. Over-encoding of trajectory information in stored object tokens in early infancy, in particular, is expected to disrupt the ability to re-identify individuals across perceptual episodes, and lead to developmental outcomes with characteristics of autism spectrum disorders
Saccadic Predictive Vision Model with a Fovea
We propose a model that emulates saccades, the rapid movements of the eye,
called the Error Saccade Model, based on the prediction error of the Predictive
Vision Model (PVM). The Error Saccade Model carries out movements of the
model's field of view to regions with the highest prediction error. Comparisons
of the Error Saccade Model on Predictive Vision Models with and without a fovea
show that a fovea-like structure in the input level of the PVM improves the
Error Saccade Model's ability to pursue detailed objects in its view. We
hypothesize that the improvement is due to poorer resolution in the periphery
causing higher prediction error when an object passes, triggering a saccade to
the next location.Comment: 10 pages, 6 figure, Accepted in International Conference of
Neuromorphic Computing (2018
- …