5,875 research outputs found
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
Soft Biometric Analysis: MultiPerson and RealTime Pedestrian Attribute Recognition in Crowded Urban Environments
Traditionally, recognition systems were only based on human hard biometrics. However,
the ubiquitous CCTV cameras have raised the desire to analyze human biometrics from
far distances, without people attendance in the acquisition process. Highresolution
face closeshots
are rarely available at far distances such that facebased
systems cannot
provide reliable results in surveillance applications. Human soft biometrics such as body
and clothing attributes are believed to be more effective in analyzing human data collected
by security cameras.
This thesis contributes to the human soft biometric analysis in uncontrolled environments
and mainly focuses on two tasks: Pedestrian Attribute Recognition (PAR) and person reidentification
(reid).
We first review the literature of both tasks and highlight the history
of advancements, recent developments, and the existing benchmarks. PAR and person reid
difficulties are due to significant distances between intraclass
samples, which originate
from variations in several factors such as body pose, illumination, background, occlusion,
and data resolution. Recent stateoftheart
approaches present endtoend
models that
can extract discriminative and comprehensive feature representations from people. The
correlation between different regions of the body and dealing with limited learning data
is also the objective of many recent works. Moreover, class imbalance and correlation
between human attributes are specific challenges associated with the PAR problem.
We collect a large surveillance dataset to train a novel gender recognition model suitable
for uncontrolled environments. We propose a deep residual network that extracts several
posewise
patches from samples and obtains a comprehensive feature representation. In
the next step, we develop a model for multiple attribute recognition at once. Considering
the correlation between human semantic attributes and class imbalance, we respectively
use a multitask
model and a weighted loss function. We also propose a multiplication
layer on top of the backbone features extraction layers to exclude the background features
from the final representation of samples and draw the attention of the model to the
foreground area.
We address the problem of person reid
by implicitly defining the receptive fields of
deep learning classification frameworks. The receptive fields of deep learning models
determine the most significant regions of the input data for providing correct decisions.
Therefore, we synthesize a set of learning data in which the destructive regions (e.g.,
background) in each pair of instances are interchanged. A segmentation module
determines destructive and useful regions in each sample, and the label of synthesized
instances are inherited from the sample that shared the useful regions in the synthesized
image. The synthesized learning data are then used in the learning phase and help
the model rapidly learn that the identity and background regions are not correlated.
Meanwhile, the proposed solution could be seen as a data augmentation approach that
fully preserves the label information and is compatible with other data augmentation
techniques.
When reid
methods are learned in scenarios where the target person appears with identical garments in the gallery, the visual appearance of clothes is given the most
importance in the final feature representation. Clothbased
representations are not
reliable in the longterm
reid
settings as people may change their clothes. Therefore,
developing solutions that ignore clothing cues and focus on identityrelevant
features are
in demand. We transform the original data such that the identityrelevant
information of
people (e.g., face and body shape) are removed, while the identityunrelated
cues (i.e.,
color and texture of clothes) remain unchanged. A learned model on the synthesized
dataset predicts the identityunrelated
cues (shortterm
features). Therefore, we train a
second model coupled with the first model and learns the embeddings of the original data
such that the similarity between the embeddings of the original and synthesized data is
minimized. This way, the second model predicts based on the identityrelated
(longterm)
representation of people.
To evaluate the performance of the proposed models, we use PAR and person reid
datasets, namely BIODI, PETA, RAP, Market1501,
MSMTV2,
PRCC, LTCC, and MIT
and compared our experimental results with stateoftheart
methods in the field.
In conclusion, the data collected from surveillance cameras have low resolution, such
that the extraction of hard biometric features is not possible, and facebased
approaches
produce poor results. In contrast, soft biometrics are robust to variations in data quality.
So, we propose approaches both for PAR and person reid
to learn discriminative features
from each instance and evaluate our proposed solutions on several publicly available
benchmarks.This thesis was prepared at the University of Beria Interior, IT Instituto de Telecomunicações, Soft Computing and Image Analysis Laboratory (SOCIA Lab), Covilhã Delegation, and was submitted to the University of Beira Interior for defense in a public examination session
Dynamic scene understanding: Pedestrian tracking from aerial devices.
Multiple Object Tracking (MOT) is the problem that involves following the trajectory of multiple objects in a sequence, generally a video. Pedestrians are among the most interesting subjects to track and recognize for many purposes such as surveillance, and safety. In the recent years, Unmanned Aerial Vehicles (UAV’s) have been viewed as a viable option for monitoring public areas, as they provide a low-cost method of data collection while covering large and difficult-to-reach areas. In this thesis, we present an online pedestrian tracking and re-identification from aerial devices framework. This framework is based on learning a compact directional statistic distribution (von-Mises-Fisher distribution) for each person ID using a deep convolutional neural network. The distribution characteristics are trained to be invariant to clothes appearances and to transformations. In real world scenarios, during deployment, new pedestrian and objects can appear in the scene and the model should detect them as Out Of Distribution (OOD). Thus, our frameworks also includes an OOD detection adopted from [16] called Virtual Outlier Synthetic (VOS), that detects OOD based on synthesising virtual outlier in the embedding space in an online manner. To validate, analyze and compare our approach, we use a large real benchmark data that contain detection tracking and identity annotations. These targets are captured at different viewing angles, different places, and different times by a ”DJI Phantom 4” drone. We validate the effectiveness of the proposed framework by evaluating their detection, tracking and long term identification performance as well as classification performance between In Distribution (ID) and OOD. We show that the the proposed methods in the framework can learn models that achieve their objectives
Linking theories, past practices, and archaeological remains of movement through ontological reasoning
The amount of information available to archaeologists has grown dramatically during the last ten years. The rapid acquisition of observational data and creation of digital data has played a significant role in this “information explosion”. In this paper, we propose new methods for knowledge creation in studies of movement, designed for the present data-rich research context. Using three case studies, we analyze how researchers have identified, conceptualized, and linked the material traces describing various movement processes in a given region. Then, we explain how we construct ontologies that enable us to explicitly relate material elements, identified in the observed landscape, to the knowledge or theory that explains their role and relationships within the movement process. Combining formal pathway systems and informal movement systems through these three case studies, we argue that these systems are not hierarchically integrated, but rather intertwined. We introduce a new heuristic tool, the “track graph”, to record observed material features in a neutral form which can be employed to reconstruct the trajectories of journeys which follow different movement logics. Finally, we illustrate how the breakdown of implicit conceptual references into explicit, logical chains of reasoning, describing basic entities and their relationships, allows the use of these constituent elements to reconstruct, analyze, and compare movement practices from the bottom up
Towards Large-Scale Small Object Detection: Survey and Benchmarks
With the rise of deep convolutional neural networks, object detection has
achieved prominent advances in past years. However, such prosperity could not
camouflage the unsatisfactory situation of Small Object Detection (SOD), one of
the notoriously challenging tasks in computer vision, owing to the poor visual
appearance and noisy representation caused by the intrinsic structure of small
targets. In addition, large-scale dataset for benchmarking small object
detection methods remains a bottleneck. In this paper, we first conduct a
thorough review of small object detection. Then, to catalyze the development of
SOD, we construct two large-scale Small Object Detection dAtasets (SODA),
SODA-D and SODA-A, which focus on the Driving and Aerial scenarios
respectively. SODA-D includes 24828 high-quality traffic images and 278433
instances of nine categories. For SODA-A, we harvest 2513 high resolution
aerial images and annotate 872069 instances over nine classes. The proposed
datasets, as we know, are the first-ever attempt to large-scale benchmarks with
a vast collection of exhaustively annotated instances tailored for
multi-category SOD. Finally, we evaluate the performance of mainstream methods
on SODA. We expect the released benchmarks could facilitate the development of
SOD and spawn more breakthroughs in this field. Datasets and codes are
available at: \url{https://shaunyuan22.github.io/SODA}
- …