1,382,983 research outputs found
Knowledge representation for basic visual categories
This paper reports work on a model of machine learning which is based on the psychological theory of prototypical concepts. This theory is that concepts learnt naturally from interaction with the environment (basic categories) are not structured or defined in logical terms but are clustered in accordance with their similarity to a central prototype, representing the "most typical'' member
Multi-View Task-Driven Recognition in Visual Sensor Networks
Nowadays, distributed smart cameras are deployed for a wide set of tasks in
several application scenarios, ranging from object recognition, image
retrieval, and forensic applications. Due to limited bandwidth in distributed
systems, efficient coding of local visual features has in fact been an active
topic of research. In this paper, we propose a novel approach to obtain a
compact representation of high-dimensional visual data using sensor fusion
techniques. We convert the problem of visual analysis in resource-limited
scenarios to a multi-view representation learning, and we show that the key to
finding properly compressed representation is to exploit the position of
cameras with respect to each other as a norm-based regularization in the
particular signal representation of sparse coding. Learning the representation
of each camera is viewed as an individual task and a multi-task learning with
joint sparsity for all nodes is employed. The proposed representation learning
scheme is referred to as the multi-view task-driven learning for visual sensor
network (MT-VSN). We demonstrate that MT-VSN outperforms state-of-the-art in
various surveillance recognition tasks.Comment: 5 pages, Accepted in International Conference of Image Processing,
201
Drawing the Representation
This article argues that the Representation is drawn by the perceiver: that it does not arrive at the visual cortex fully-formed. Rather, colour arrives at the visual cortex and the Representation is drawn from that
Transitive Invariance for Self-supervised Visual Representation Learning
Learning visual representations with self-supervised learning has become
popular in computer vision. The idea is to design auxiliary tasks where labels
are free to obtain. Most of these tasks end up providing data to learn specific
kinds of invariance useful for recognition. In this paper, we propose to
exploit different self-supervised approaches to learn representations invariant
to (i) inter-instance variations (two objects in the same class should have
similar features) and (ii) intra-instance variations (viewpoint, pose,
deformations, illumination, etc). Instead of combining two approaches with
multi-task learning, we argue to organize and reason the data with multiple
variations. Specifically, we propose to generate a graph with millions of
objects mined from hundreds of thousands of videos. The objects are connected
by two types of edges which correspond to two types of invariance: "different
instances but a similar viewpoint and category" and "different viewpoints of
the same instance". By applying simple transitivity on the graph with these
edges, we can obtain pairs of images exhibiting richer visual invariance. We
use this data to train a Triplet-Siamese network with VGG16 as the base
architecture and apply the learned representations to different recognition
tasks. For object detection, we achieve 63.2% mAP on PASCAL VOC 2007 using Fast
R-CNN (compare to 67.3% with ImageNet pre-training). For the challenging COCO
dataset, our method is surprisingly close (23.5%) to the ImageNet-supervised
counterpart (24.4%) using the Faster R-CNN framework. We also show that our
network can perform significantly better than the ImageNet network in the
surface normal estimation task.Comment: ICCV 201
- …
