113,258 research outputs found
NTU RGB+D 120: A Large-Scale Benchmark for 3D Human Activity Understanding
Research on depth-based human activity analysis achieved outstanding
performance and demonstrated the effectiveness of 3D representation for action
recognition. The existing depth-based and RGB+D-based action recognition
benchmarks have a number of limitations, including the lack of large-scale
training samples, realistic number of distinct class categories, diversity in
camera views, varied environmental conditions, and variety of human subjects.
In this work, we introduce a large-scale dataset for RGB+D human action
recognition, which is collected from 106 distinct subjects and contains more
than 114 thousand video samples and 8 million frames. This dataset contains 120
different action classes including daily, mutual, and health-related
activities. We evaluate the performance of a series of existing 3D activity
analysis methods on this dataset, and show the advantage of applying deep
learning methods for 3D-based human action recognition. Furthermore, we
investigate a novel one-shot 3D activity recognition problem on our dataset,
and a simple yet effective Action-Part Semantic Relevance-aware (APSR)
framework is proposed for this task, which yields promising results for
recognition of the novel action classes. We believe the introduction of this
large-scale dataset will enable the community to apply, adapt, and develop
various data-hungry learning techniques for depth-based and RGB+D-based human
activity understanding. [The dataset is available at:
http://rose1.ntu.edu.sg/Datasets/actionRecognition.asp]Comment: IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI
Attribute Interactions in Medical Data Analysis
There is much empirical evidence about the success of naive Bayesian classification (NBC) in medical applications of attribute-based machine learning. NBC assumes conditional independence between attributes. In classification, such classifiers sum up the pieces of class-related evidence from individual attributes, independently of other attributes. The performance, however, deteriorates significantly when the âinteractionsâ between attributes become critical. We propose an approach to handling attribute interactions within the framework of âvotingâ classifiers, such as NBC. We propose an operational test for detecting interactions in learning data and a procedure that takes the detected interactions into account while learning. This approach induces a structuring of the domain of attributes, it may lead to improved classifierâs performance and may provide useful novel information for the domain expert when interpreting the results of learning. We report on its application in data analysis and model construction for the prediction of clinical outcome in hip arthroplasty
Automatic summarization of rushes video using bipartite graphs
In this paper we present a new approach for automatic summarization of rushes video. Our approach is composed of three main steps. First, based on a temporal segmentation, we filter sub-shots with low information content not likely to be useful in a summary. Second, a method using maximal matching in a bipartite graph is adapted to measure similarity between the remaining shots and to minimize inter-shot redundancy by removing repetitive retake shots common in rushes content. Finally, the presence of faces and the motion intensity are characterised in each sub-shot. A measure of how representative the sub-shot is in the context of the overall video is then proposed. Video summaries composed of keyframe slideshows are then generated. In order to evaluate the effectiveness of this approach we re-run the evaluation carried out by the TREC, using the same dataset and evaluation metrics used in the TRECVID video summarization task in 2007 but with our own assessors. Results show that our approach leads to a significant improvement in terms of the fraction of the TRECVID summary ground truth included and is competitive with other approaches in TRECVID 2007
Spatial-Aware Object Embeddings for Zero-Shot Localization and Classification of Actions
We aim for zero-shot localization and classification of human actions in
video. Where traditional approaches rely on global attribute or object
classification scores for their zero-shot knowledge transfer, our main
contribution is a spatial-aware object embedding. To arrive at spatial
awareness, we build our embedding on top of freely available actor and object
detectors. Relevance of objects is determined in a word embedding space and
further enforced with estimated spatial preferences. Besides local object
awareness, we also embed global object awareness into our embedding to maximize
actor and object interaction. Finally, we exploit the object positions and
sizes in the spatial-aware embedding to demonstrate a new spatio-temporal
action retrieval scenario with composite queries. Action localization and
classification experiments on four contemporary action video datasets support
our proposal. Apart from state-of-the-art results in the zero-shot localization
and classification settings, our spatial-aware embedding is even competitive
with recent supervised action localization alternatives.Comment: ICC
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition
Motion representation plays a vital role in human action recognition in
videos. In this study, we introduce a novel compact motion representation for
video action recognition, named Optical Flow guided Feature (OFF), which
enables the network to distill temporal information through a fast and robust
approach. The OFF is derived from the definition of optical flow and is
orthogonal to the optical flow. The derivation also provides theoretical
support for using the difference between two frames. By directly calculating
pixel-wise spatiotemporal gradients of the deep feature maps, the OFF could be
embedded in any existing CNN based video action recognition framework with only
a slight additional cost. It enables the CNN to extract spatiotemporal
information, especially the temporal information between frames simultaneously.
This simple but powerful idea is validated by experimental results. The network
with OFF fed only by RGB inputs achieves a competitive accuracy of 93.3% on
UCF-101, which is comparable with the result obtained by two streams (RGB and
optical flow), but is 15 times faster in speed. Experimental results also show
that OFF is complementary to other motion modalities such as optical flow. When
the proposed method is plugged into the state-of-the-art video action
recognition framework, it has 96:0% and 74:2% accuracy on UCF-101 and HMDB-51
respectively. The code for this project is available at
https://github.com/kevin-ssy/Optical-Flow-Guided-Feature.Comment: CVPR 2018. code available at
https://github.com/kevin-ssy/Optical-Flow-Guided-Featur
Predicting Human Interaction via Relative Attention Model
Predicting human interaction is challenging as the on-going activity has to
be inferred based on a partially observed video. Essentially, a good algorithm
should effectively model the mutual influence between the two interacting
subjects. Also, only a small region in the scene is discriminative for
identifying the on-going interaction. In this work, we propose a relative
attention model to explicitly address these difficulties. Built on a
tri-coupled deep recurrent structure representing both interacting subjects and
global interaction status, the proposed network collects spatio-temporal
information from each subject, rectified with global interaction information,
yielding effective interaction representation. Moreover, the proposed network
also unifies an attention module to assign higher importance to the regions
which are relevant to the on-going action. Extensive experiments have been
conducted on two public datasets, and the results demonstrate that the proposed
relative attention network successfully predicts informative regions between
interacting subjects, which in turn yields superior human interaction
prediction accuracy.Comment: To appear in IJCAI 201
- âŠ