3,481 research outputs found
Slow and steady feature analysis: higher order temporal coherence in video
How can unlabeled video augment visual learning? Existing methods perform
"slow" feature analysis, encouraging the representations of temporally close
frames to exhibit only small differences. While this standard approach captures
the fact that high-level visual signals change slowly over time, it fails to
capture *how* the visual content changes. We propose to generalize slow feature
analysis to "steady" feature analysis. The key idea is to impose a prior that
higher order derivatives in the learned feature space must be small. To this
end, we train a convolutional neural network with a regularizer on tuples of
sequential frames from unlabeled video. It encourages feature changes over time
to be smooth, i.e., similar to the most recent changes. Using five diverse
datasets, including unlabeled YouTube and KITTI videos, we demonstrate our
method's impact on object, scene, and action recognition tasks. We further show
that our features learned from unlabeled video can even surpass a standard
heavily supervised pretraining approach.Comment: in Computer Vision and Pattern Recognition (CVPR) 2016, Las Vegas,
NV, June 201
Automatic vehicle tracking and recognition from aerial image sequences
This paper addresses the problem of automated vehicle tracking and
recognition from aerial image sequences. Motivated by its successes in the
existing literature focus on the use of linear appearance subspaces to describe
multi-view object appearance and highlight the challenges involved in their
application as a part of a practical system. A working solution which includes
steps for data extraction and normalization is described. In experiments on
real-world data the proposed methodology achieved promising results with a high
correct recognition rate and few, meaningful errors (type II errors whereby
genuinely similar targets are sometimes being confused with one another).
Directions for future research and possible improvements of the proposed method
are discussed
Learning Human Pose Estimation Features with Convolutional Networks
This paper introduces a new architecture for human pose estimation using a
multi- layer convolutional network architecture and a modified learning
technique that learns low-level features and higher-level weak spatial models.
Unconstrained human pose estimation is one of the hardest problems in computer
vision, and our new architecture and learning schema shows significant
improvement over the current state-of-the-art results. The main contribution of
this paper is showing, for the first time, that a specific variation of deep
learning is able to outperform all existing traditional architectures on this
task. The paper also discusses several lessons learned while researching
alternatives, most notably, that it is possible to learn strong low-level
feature detectors on features that might even just cover a few pixels in the
image. Higher-level spatial models improve somewhat the overall result, but to
a much lesser extent then expected. Many researchers previously argued that the
kinematic structure and top-down information is crucial for this domain, but
with our purely bottom up, and weak spatial model, we could improve other more
complicated architectures that currently produce the best results. This mirrors
what many other researchers, like those in the speech recognition, object
recognition, and other domains have experienced
CURE-OR: Challenging Unreal and Real Environments for Object Recognition
In this paper, we introduce a large-scale, controlled, and multi-platform
object recognition dataset denoted as Challenging Unreal and Real Environments
for Object Recognition (CURE-OR). In this dataset, there are 1,000,000 images
of 100 objects with varying size, color, and texture that are positioned in
five different orientations and captured using five devices including a webcam,
a DSLR, and three smartphone cameras in real-world (real) and studio (unreal)
environments. The controlled challenging conditions include underexposure,
overexposure, blur, contrast, dirty lens, image noise, resizing, and loss of
color information. We utilize CURE-OR dataset to test recognition APIs-Amazon
Rekognition and Microsoft Azure Computer Vision- and show that their
performance significantly degrades under challenging conditions. Moreover, we
investigate the relationship between object recognition and image quality and
show that objective quality algorithms can estimate recognition performance
under certain photometric challenging conditions. The dataset is publicly
available at https://ghassanalregib.com/cure-or/.Comment: 8 pages, 7 figures, 4 table
Semi-supervised Tuning from Temporal Coherence
Recent works demonstrated the usefulness of temporal coherence to regularize
supervised training or to learn invariant features with deep architectures. In
particular, enforcing smooth output changes while presenting temporally-closed
frames from video sequences, proved to be an effective strategy. In this paper
we prove the efficacy of temporal coherence for semi-supervised incremental
tuning. We show that a deep architecture, just mildly trained in a supervised
manner, can progressively improve its classification accuracy, if exposed to
video sequences of unlabeled data. The extent to which, in some cases, a
semi-supervised tuning allows to improve classification accuracy (approaching
the supervised one) is somewhat surprising. A number of control experiments
pointed out the fundamental role of temporal coherence.Comment: Under review as a conference paper at ICLR 201
- …