27,370 research outputs found
DAP3D-Net: Where, What and How Actions Occur in Videos?
Action parsing in videos with complex scenes is an interesting but
challenging task in computer vision. In this paper, we propose a generic 3D
convolutional neural network in a multi-task learning manner for effective Deep
Action Parsing (DAP3D-Net) in videos. Particularly, in the training phase,
action localization, classification and attributes learning can be jointly
optimized on our appearancemotion data via DAP3D-Net. For an upcoming test
video, we can describe each individual action in the video simultaneously as:
Where the action occurs, What the action is and How the action is performed. To
well demonstrate the effectiveness of the proposed DAP3D-Net, we also
contribute a new Numerous-category Aligned Synthetic Action dataset, i.e.,
NASA, which consists of 200; 000 action clips of more than 300 categories and
with 33 pre-defined action attributes in two hierarchical levels (i.e.,
low-level attributes of basic body part movements and high-level attributes
related to action motion). We learn DAP3D-Net using the NASA dataset and then
evaluate it on our collected Human Action Understanding (HAU) dataset.
Experimental results show that our approach can accurately localize, categorize
and describe multiple actions in realistic videos
Deep Sketch Hashing: Fast Free-hand Sketch-Based Image Retrieval
Free-hand sketch-based image retrieval (SBIR) is a specific cross-view
retrieval task, in which queries are abstract and ambiguous sketches while the
retrieval database is formed with natural images. Work in this area mainly
focuses on extracting representative and shared features for sketches and
natural images. However, these can neither cope well with the geometric
distortion between sketches and images nor be feasible for large-scale SBIR due
to the heavy continuous-valued distance computation. In this paper, we speed up
SBIR by introducing a novel binary coding method, named \textbf{Deep Sketch
Hashing} (DSH), where a semi-heterogeneous deep architecture is proposed and
incorporated into an end-to-end binary coding framework. Specifically, three
convolutional neural networks are utilized to encode free-hand sketches,
natural images and, especially, the auxiliary sketch-tokens which are adopted
as bridges to mitigate the sketch-image geometric distortion. The learned DSH
codes can effectively capture the cross-view similarities as well as the
intrinsic semantic correlations between different categories. To the best of
our knowledge, DSH is the first hashing work specifically designed for
category-level SBIR with an end-to-end deep architecture. The proposed DSH is
comprehensively evaluated on two large-scale datasets of TU-Berlin Extension
and Sketchy, and the experiments consistently show DSH's superior SBIR
accuracies over several state-of-the-art methods, while achieving significantly
reduced retrieval time and memory footprint.Comment: This paper will appear as a spotlight paper in CVPR201
- …