482 research outputs found
Discriminatively Trained Latent Ordinal Model for Video Classification
We study the problem of video classification for facial analysis and human
action recognition. We propose a novel weakly supervised learning method that
models the video as a sequence of automatically mined, discriminative
sub-events (eg. onset and offset phase for "smile", running and jumping for
"highjump"). The proposed model is inspired by the recent works on Multiple
Instance Learning and latent SVM/HCRF -- it extends such frameworks to model
the ordinal aspect in the videos, approximately. We obtain consistent
improvements over relevant competitive baselines on four challenging and
publicly available video based facial analysis datasets for prediction of
expression, clinical pain and intent in dyadic conversations and on three
challenging human action datasets. We also validate the method with qualitative
results and show that they largely support the intuitions behind the method.Comment: Paper accepted in IEEE TPAMI. arXiv admin note: substantial text
overlap with arXiv:1604.0150
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
TransNet: A Transfer Learning-Based Network for Human Action Recognition
Human action recognition (HAR) is a high-level and significant research area
in computer vision due to its ubiquitous applications. The main limitations of
the current HAR models are their complex structures and lengthy training time.
In this paper, we propose a simple yet versatile and effective end-to-end deep
learning architecture, coined as TransNet, for HAR. TransNet decomposes the
complex 3D-CNNs into 2D- and 1D-CNNs, where the 2D- and 1D-CNN components
extract spatial features and temporal patterns in videos, respectively.
Benefiting from its concise architecture, TransNet is ideally compatible with
any pretrained state-of-the-art 2D-CNN models in other fields, being
transferred to serve the HAR task. In other words, it naturally leverages the
power and success of transfer learning for HAR, bringing huge advantages in
terms of efficiency and effectiveness. Extensive experimental results and the
comparison with the state-of-the-art models demonstrate the superior
performance of the proposed TransNet in HAR in terms of flexibility, model
complexity, training speed and classification accuracy
Oops! Predicting Unintentional Action in Video
From just a short glance at a video, we can often tell whether a person's
action is intentional or not. Can we train a model to recognize this? We
introduce a dataset of in-the-wild videos of unintentional action, as well as a
suite of tasks for recognizing, localizing, and anticipating its onset. We
train a supervised neural network as a baseline and analyze its performance
compared to human consistency on the tasks. We also investigate self-supervised
representations that leverage natural signals in our dataset, and show the
effectiveness of an approach that uses the intrinsic speed of video to perform
competitively with highly-supervised pretraining. However, a significant gap
between machine and human performance remains. The project website is available
at https://oops.cs.columbia.eduComment: 11 pages, 9 figure
- …