21,594 research outputs found
Toward a Taxonomy and Computational Models of Abnormalities in Images
The human visual system can spot an abnormal image, and reason about what
makes it strange. This task has not received enough attention in computer
vision. In this paper we study various types of atypicalities in images in a
more comprehensive way than has been done before. We propose a new dataset of
abnormal images showing a wide range of atypicalities. We design human subject
experiments to discover a coarse taxonomy of the reasons for abnormality. Our
experiments reveal three major categories of abnormality: object-centric,
scene-centric, and contextual. Based on this taxonomy, we propose a
comprehensive computational model that can predict all different types of
abnormality in images and outperform prior arts in abnormality recognition.Comment: To appear in the Thirtieth AAAI Conference on Artificial Intelligence
(AAAI 2016
Feedback-prop: Convolutional Neural Network Inference under Partial Evidence
We propose an inference procedure for deep convolutional neural networks
(CNNs) when partial evidence is available. Our method consists of a general
feedback-based propagation approach (feedback-prop) that boosts the prediction
accuracy for an arbitrary set of unknown target labels when the values for a
non-overlapping arbitrary set of target labels are known. We show that existing
models trained in a multi-label or multi-task setting can readily take
advantage of feedback-prop without any retraining or fine-tuning. Our
feedback-prop inference procedure is general, simple, reliable, and works on
different challenging visual recognition tasks. We present two variants of
feedback-prop based on layer-wise and residual iterative updates. We experiment
using several multi-task models and show that feedback-prop is effective in all
of them. Our results unveil a previously unreported but interesting dynamic
property of deep CNNs. We also present an associated technical approach that
takes advantage of this property for inference under partial evidence in
general visual recognition tasks.Comment: Accepted to CVPR 201
Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks
It is common to implicitly assume access to intelligently captured inputs
(e.g., photos from a human photographer), yet autonomously capturing good
observations is itself a major challenge. We address the problem of learning to
look around: if a visual agent has the ability to voluntarily acquire new views
to observe its environment, how can it learn efficient exploratory behaviors to
acquire informative observations? We propose a reinforcement learning solution,
where the agent is rewarded for actions that reduce its uncertainty about the
unobserved portions of its environment. Based on this principle, we develop a
recurrent neural network-based approach to perform active completion of
panoramic natural scenes and 3D object shapes. Crucially, the learned policies
are not tied to any recognition task nor to the particular semantic content
seen during training. As a result, 1) the learned "look around" behavior is
relevant even for new tasks in unseen environments, and 2) training data
acquisition involves no manual labeling. Through tests in diverse settings, we
demonstrate that our approach learns useful generic policies that transfer to
new unseen tasks and environments. Completion episodes are shown at
https://goo.gl/BgWX3W
Learning Object Categories From Internet Image Searches
In this paper, we describe a simple approach to learning models of visual object categories from images gathered from Internet image search engines. The images for a given keyword are typically highly variable, with a large fraction being unrelated to the query term, and thus pose a challenging environment from which to learn. By training our models directly from Internet images, we remove the need to laboriously compile training data sets, required by most other recognition approaches-this opens up the possibility of learning object category models “on-the-fly.” We describe two simple approaches, derived from the probabilistic latent semantic analysis (pLSA) technique for text document analysis, that can be used to automatically learn object models from these data. We show two applications of the learned model: first, to rerank the images returned by the search engine, thus improving the quality of the search engine; and second, to recognize objects in other image data sets
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
Category-specific incremental visual codebook training for scene categorization
In this paper, we propose a category-specific incremental visual codebook training method for scene categorization. In this method, based on a preliminary codebook trained from a subset of training samples, we incrementally introduce the remaining training samples to enrich the content of the visual codebook. Then, the incremental learned codebook is used to encode the images for scene categorization. The advantages of the proposed method are (1) computationally efficient comparing with batch mode clustering method; (2) the number of visual words is determined automatically in the incremental learning procedure; (3) scene categorization performance is improved using the enriched codebook comparing with using the codebook trained from a subset of training samples. The experimental results show the effectiveness of the proposed method. © 2010 IEEE.published_or_final_versionThe 17th IEEE International Conference on Image Processing (ICIP 2010), Hong Kong, China, 26-29 September 2010. In Proceedings of 17th ICIP, 2010, p. 1501-150
Human-Machine CRFs for Identifying Bottlenecks in Holistic Scene Understanding
Recent trends in image understanding have pushed for holistic scene
understanding models that jointly reason about various tasks such as object
detection, scene recognition, shape analysis, contextual reasoning, and local
appearance based classifiers. In this work, we are interested in understanding
the roles of these different tasks in improved scene understanding, in
particular semantic segmentation, object detection and scene recognition.
Towards this goal, we "plug-in" human subjects for each of the various
components in a state-of-the-art conditional random field model. Comparisons
among various hybrid human-machine CRFs give us indications of how much "head
room" there is to improve scene understanding by focusing research efforts on
various individual tasks
- …