117 research outputs found
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
Person Re-identification by Articulated Appearance Matching
Abstract Re-identification of pedestrians in video-surveillance settings can be ef-fectively approached by treating each human figure as an articulated body, whose pose is estimated through the framework of Pictorial Structures (PS). In this way, we can focus selectively on similarities between the appearance of body parts to recognize a previously seen individual. In fact, this strategy resembles what humans employ to solve the same task in the absence of facial details or other reliable bio-metric information. Based on these insights, we show how to perform single image re-identification by matching signatures coming from articulated appearances, and how to strengthen this process in multi-shot re-identification by using Custom Picto-rial Structures (CPS) to produce improved body localizations and appearance signa-tures. Moreover, we provide a complete and detailed breakdown of the system that surrounds these core procedures, with several novel arrangements devised for effi-ciency and flexibility. Finally, we test our approach on several public benchmarks, obtaining convincing results.
On Symbiosis of Attribute Prediction and Semantic Segmentation
In this paper, we propose to employ semantic segmentation to improve
person-related attribute prediction. The core idea lies in the fact that the
probability of an attribute to appear in an image is far from being uniform in
the spatial domain. We build our attribute prediction model jointly with a deep
semantic segmentation network. This harnesses the localization cues learned by
the semantic segmentation to guide the attention of the attribute prediction to
the regions where different attributes naturally show up. Therefore, in
addition to prediction, we are able to localize the attributes despite merely
having access to image-level labels (weak supervision) during training. We
first propose semantic segmentation-based pooling and gating, respectively
denoted as SSP and SSG. In the former, the estimated segmentation masks are
used to pool the final activations of the attribute prediction network, from
multiple semantically homogeneous regions. In SSG, the same idea is applied to
the intermediate layers of the network. SSP and SSG, while effective, impose
heavy memory utilization since each channel of the activations is pooled/gated
with all the semantic segmentation masks. To circumvent this, we propose
Symbiotic Augmentation (SA), where we learn only one mask per activation
channel. SA allows the model to either pick one, or combine (weighted
superposition) multiple semantic maps, in order to generate the proper mask for
each channel. SA simultaneously applies the same mechanism to the reverse
problem by leveraging output logits of attribute prediction to guide the
semantic segmentation task. We evaluate our proposed methods for facial
attributes on CelebA and LFWA datasets, while benchmarking WIDER Attribute and
Berkeley Attributes of People for whole body attributes. Our proposed methods
achieve superior results compared to the previous works.Comment: Accepted for publication in PAMI. arXiv admin note: substantial text
overlap with arXiv:1704.0874
Recommended from our members
Automatic Multilevel Feature Abstraction in Adaptable Machine Vision Systems
Vision is a complex task which can be accomplished with apparent ease by biological systems, but for which the design of artificial systems is difficult. Although machine vision systems can be successfully designed for a specific task, under certain conditions, they are likely to fail if circumstances change. This was the motivation for the research into ways in which systems can be self-designing and adaptable to new visual tasks. The research was conducted in three vital areas of concern for machine vision systems.
The first area is finding a suitable architecture for forming an appropriate representation for the current task. The research investigated the application of Hypernetworks theory to building a multilevel, generally-applicable representation, through repeated application of a fundamental 'self-similarity' principle, that parts of objects assembled under a particular relation at one level, form whole objects at the next. Results show that this is potentially a powerful approach for autonomously generating an adaptable system-architecture suitable for multiple visual tasks.
The second area is the autonomous extraction of suitable low-level features, which the research investigated through random generation of minimally-constrained pixel-configurations and algorithmic generation of homogeneous and heterogeneous polygons. The results suggest that, despite the simplicity of the features making them vulnerable to image transformations, these are promising approaches worth developing further.
The third area is automatic feature selection. The research explored management of 'dimensionality' and of 'combinatorial explosion', as well as how to locate relevant features at multiple representation levels, in the context of 'emergence' of structure. Results indicate that this approach can find useful 'intermediate-level' constructs through analysis of the connectivity of the simplices representing objects at higher levels.
The research concludes that the proposed novel approaches to tackling the above issues, in particular the application of hypernetworks to the formation of multilevel representations and the resulting emergence of higher-level structure, is fruitful
- …