8,340 research outputs found
Exploiting Image-trained CNN Architectures for Unconstrained Video Classification
We conduct an in-depth exploration of different strategies for doing event
detection in videos using convolutional neural networks (CNNs) trained for
image classification. We study different ways of performing spatial and
temporal pooling, feature normalization, choice of CNN layers as well as choice
of classifiers. Making judicious choices along these dimensions led to a very
significant increase in performance over more naive approaches that have been
used till now. We evaluate our approach on the challenging TRECVID MED'14
dataset with two popular CNN architectures pretrained on ImageNet. On this
MED'14 dataset, our methods, based entirely on image-trained CNN features, can
outperform several state-of-the-art non-CNN models. Our proposed late fusion of
CNN- and motion-based features can further increase the mean average precision
(mAP) on MED'14 from 34.95% to 38.74%. The fusion approach achieves the
state-of-the-art classification performance on the challenging UCF-101 dataset
Multi-view Convolutional Neural Networks for 3D Shape Recognition
A longstanding question in computer vision concerns the representation of 3D
shapes for recognition: should 3D shapes be represented with descriptors
operating on their native 3D formats, such as voxel grid or polygon mesh, or
can they be effectively represented with view-based descriptors? We address
this question in the context of learning to recognize 3D shapes from a
collection of their rendered views on 2D images. We first present a standard
CNN architecture trained to recognize the shapes' rendered views independently
of each other, and show that a 3D shape can be recognized even from a single
view at an accuracy far higher than using state-of-the-art 3D shape
descriptors. Recognition rates further increase when multiple views of the
shapes are provided. In addition, we present a novel CNN architecture that
combines information from multiple views of a 3D shape into a single and
compact shape descriptor offering even better recognition performance. The same
architecture can be applied to accurately recognize human hand-drawn sketches
of shapes. We conclude that a collection of 2D views can be highly informative
for 3D shape recognition and is amenable to emerging CNN architectures and
their derivatives.Comment: v1: Initial version. v2: An updated ModelNet40 training/test split is
used; results with low-rank Mahalanobis metric learning are added. v3 (ICCV
2015): A second camera setup without the upright orientation assumption is
added; some accuracy and mAP numbers are changed slightly because a small
issue in mesh rendering related to specularities is fixe
Place recognition: An Overview of Vision Perspective
Place recognition is one of the most fundamental topics in computer vision
and robotics communities, where the task is to accurately and efficiently
recognize the location of a given query image. Despite years of wisdom
accumulated in this field, place recognition still remains an open problem due
to the various ways in which the appearance of real-world places may differ.
This paper presents an overview of the place recognition literature. Since
condition invariant and viewpoint invariant features are essential factors to
long-term robust visual place recognition system, We start with traditional
image description methodology developed in the past, which exploit techniques
from image retrieval field. Recently, the rapid advances of related fields such
as object detection and image classification have inspired a new technique to
improve visual place recognition system, i.e., convolutional neural networks
(CNNs). Thus we then introduce recent progress of visual place recognition
system based on CNNs to automatically learn better image representations for
places. Eventually, we close with discussions and future work of place
recognition.Comment: Applied Sciences (2018
- …