313 research outputs found
SSAH: Semi-supervised Adversarial Deep Hashing with Self-paced Hard Sample Generation
Deep hashing methods have been proved to be effective and efficient for
large-scale Web media search. The success of these data-driven methods largely
depends on collecting sufficient labeled data, which is usually a crucial
limitation in practical cases. The current solutions to this issue utilize
Generative Adversarial Network (GAN) to augment data in semi-supervised
learning. However, existing GAN-based methods treat image generations and
hashing learning as two isolated processes, leading to generation
ineffectiveness. Besides, most works fail to exploit the semantic information
in unlabeled data. In this paper, we propose a novel Semi-supervised Self-pace
Adversarial Hashing method, named SSAH to solve the above problems in a unified
framework. The SSAH method consists of an adversarial network (A-Net) and a
hashing network (H-Net). To improve the quality of generative images, first,
the A-Net learns hard samples with multi-scale occlusions and multi-angle
rotated deformations which compete against the learning of accurate hashing
codes. Second, we design a novel self-paced hard generation policy to gradually
increase the hashing difficulty of generated samples. To make use of the
semantic information in unlabeled ones, we propose a semi-supervised consistent
loss. The experimental results show that our method can significantly improve
state-of-the-art models on both the widely-used hashing datasets and
fine-grained datasets
S3Aug: Segmentation, Sampling, and Shift for Action Recognition
Action recognition is a well-established area of research in computer vision.
In this paper, we propose S3Aug, a video data augmenatation for action
recognition. Unlike conventional video data augmentation methods that involve
cutting and pasting regions from two videos, the proposed method generates new
videos from a single training video through segmentation and label-to-image
transformation. Furthermore, the proposed method modifies certain categories of
label images by sampling to generate a variety of videos, and shifts
intermediate features to enhance the temporal coherency between frames of the
generate videos. Experimental results on the UCF101, HMDB51, and Mimetics
datasets demonstrate the effectiveness of the proposed method, paricularlly for
out-of-context videos of the Mimetics dataset
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
- …