Search CORE

175,089 research outputs found

Deep Motion Features for Visual Tracking

Author: Danelljan Martin
Felsberg Michael
Gladh Susanna
Khan Fahad Shahbaz
Publication venue
Publication date: 01/01/2016
Field of study

Robust visual tracking is a challenging computer vision problem, with many real-world applications. Most existing approaches employ hand-crafted appearance features, such as HOG or Color Names. Recently, deep RGB features extracted from convolutional neural networks have been successfully applied for tracking. Despite their success, these features only capture appearance information. On the other hand, motion cues provide discriminative and complementary information that can improve tracking performance. Contrary to visual tracking, deep motion features have been successfully applied for action recognition and video classification tasks. Typically, the motion features are learned by training a CNN on optical flow images extracted from large amounts of labeled videos. This paper presents an investigation of the impact of deep motion features in a tracking-by-detection framework. We further show that hand-crafted, deep RGB, and deep motion features contain complementary information. To the best of our knowledge, we are the first to propose fusing appearance information with deep motion features for visual tracking. Comprehensive experiments clearly suggest that our fusion approach with deep motion features outperforms standard methods relying on appearance information alone.Comment: ICPR 2016. Best paper award in the "Computer Vision and Robot Vision" trac

arXiv.org e-Print Archive

Publikationer från Linköpings universitet

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Evaluation of Deep Learning based Pose Estimation for Sign Language Recognition

Author: Chen X.
Cooper H.
Tokui S.
Tompson J. J.
Tompson J. J.
Yosinski J.
Publication venue
Publication date: 19/04/2016
Field of study

Human body pose estimation and hand detection are two important tasks for systems that perform computer vision-based sign language recognition(SLR). However, both tasks are challenging, especially when the input is color videos, with no depth information. Many algorithms have been proposed in the literature for these tasks, and some of the most successful recent algorithms are based on deep learning. In this paper, we introduce a dataset for human pose estimation for SLR domain. We evaluate the performance of two deep learning based pose estimation methods, by performing user-independent experiments on our dataset. We also perform transfer learning, and we obtain results that demonstrate that transfer learning can improve pose estimation accuracy. The dataset and results from these methods can create a useful baseline for future works

arXiv.org e-Print Archive

Crossref

A Study of Actor and Action Semantic Retention in Video Supervoxel Segmentation

Author: Corso Jason J.
Doell Richard F.
Hanson Catherine
Hanson Stephen José
Xu Chenliang
Publication venue
Publication date: 13/11/2013
Field of study

Existing methods in the semantic computer vision community seem unable to deal with the explosion and richness of modern, open-source and social video content. Although sophisticated methods such as object detection or bag-of-words models have been well studied, they typically operate on low level features and ultimately suffer from either scalability issues or a lack of semantic meaning. On the other hand, video supervoxel segmentation has recently been established and applied to large scale data processing, which potentially serves as an intermediate representation to high level video semantic extraction. The supervoxels are rich decompositions of the video content: they capture object shape and motion well. However, it is not yet known if the supervoxel segmentation retains the semantics of the underlying video content. In this paper, we conduct a systematic study of how well the actor and action semantics are retained in video supervoxel segmentation. Our study has human observers watching supervoxel segmentation videos and trying to discriminate both actor (human or animal) and action (one of eight everyday actions). We gather and analyze a large set of 640 human perceptions over 96 videos in 3 different supervoxel scales. Furthermore, we conduct machine recognition experiments on a feature defined on supervoxel segmentation, called supervoxel shape context, which is inspired by the higher order processes in human perception. Our ultimate findings suggest that a significant amount of semantics have been well retained in the video supervoxel segmentation and can be used for further video analysis.Comment: This article is in review at the International Journal of Semantic Computin

arXiv.org e-Print Archive

CiteSeerX

Facial Expression Analysis via Transfer Learning

Author: Zhang Xiao
Publication venue: Digital Commons @ DU
Publication date: 01/01/2015
Field of study

Automated analysis of facial expressions has remained an interesting and challenging research topic in the field of computer vision and pattern recognition due to vast applications such as human-machine interface design, social robotics, and developmental psychology. This dissertation focuses on developing and applying transfer learning algorithms - multiple kernel learning (MKL) and multi-task learning (MTL) - to resolve the problems of facial feature fusion and the exploitation of multiple facial action units (AUs) relations in designing robust facial expression recognition systems. MKL algorithms are employed to fuse multiple facial features with different kernel functions and tackle the domain adaption problem at the kernel level within support vector machines (SVM). lp-norm is adopted to enforce both sparse and nonsparse kernel combination in our methods. We further develop and apply MTL algorithms for simultaneous detection of multiple related AUs by exploiting their inter-relationships. Three variants of task structure models are designed and investigated to obtain fine depiction of AU relations. lp-norm MTMKL and TD-MTMKL (Task-Dependent MTMKL) are group-sensitive MTL methodsthat model the co-occurrence relations among AUs. On the other hand, our proposed hierarchical multi-task structural learning (HMTSL) includes a latent layer to learn a hierarchical structure to exploit all possible AU interrelations for AU detection. Extensive experiments on public face databases show that our proposed transfer learning methods have produced encouraging results compared to several state-of-the-art methods for facial expression recognition and AU detection

University of Denver