3,441 research outputs found
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools and Challenges for the Community
In recent years, deep learning (DL), a re-branding of neural networks (NNs),
has risen to the top in numerous areas, namely computer vision (CV), speech
recognition, natural language processing, etc. Whereas remote sensing (RS)
possesses a number of unique challenges, primarily related to sensors and
applications, inevitably RS draws from many of the same theories as CV; e.g.,
statistics, fusion, and machine learning, to name a few. This means that the RS
community should be aware of, if not at the leading edge of, of advancements
like DL. Herein, we provide the most comprehensive survey of state-of-the-art
RS DL research. We also review recent new developments in the DL field that can
be used in DL for RS. Namely, we focus on theories, tools and challenges for
the RS community. Specifically, we focus on unsolved challenges and
opportunities as it relates to (i) inadequate data sets, (ii)
human-understandable solutions for modelling physical phenomena, (iii) Big
Data, (iv) non-traditional heterogeneous data sources, (v) DL architectures and
learning algorithms for spectral, spatial and temporal data, (vi) transfer
learning, (vii) an improved theoretical understanding of DL systems, (viii)
high barriers to entry, and (ix) training and optimizing the DL.Comment: 64 pages, 411 references. To appear in Journal of Applied Remote
Sensin
Enhancing Semantic Segmentation: Design and Analysis of Improved U-Net Based Deep Convolutional Neural Networks
In this research, we provide a state-of-the-art method for semantic segmentation that makes use of a modified version of the U-Net architecture, which is itself based on deep convolutional neural networks (CNNs). This research delves into the ins and outs of this cutting-edge approach to semantic segmentation in an effort to boost its precision and productivity. To perform semantic segmentation, a crucial operation in computer vision, each pixel in an image must be assigned to one of many predefined item classes. The proposed Improved U-Net architecture makes use of deep CNNs to efficiently capture complex spatial characteristics while preserving associated context. The study illustrates the efficacy of the Improved U-Net in a variety of real-world circumstances through thorough experimentation and assessment. Intricate feature extraction, down-sampling, and up-sampling are all part of the network's design in order to produce high-quality segmentation results. The study demonstrates comparative evaluations against classic U-Net and other state-of-the-art models and emphasizes the significance of hyperparameter fine-tuning. The suggested architecture shows excellent performance in terms of accuracy and generalization, demonstrating its promise for a variety of applications. Finally, the problem of semantic segmentation is addressed in a novel way. The experimental findings validate the relevance of the architecture's design decisions and demonstrate its potential to boost computer vision by enhancing segmentation precision and efficiency
Crossing Generative Adversarial Networks for Cross-View Person Re-identification
Person re-identification (\textit{re-id}) refers to matching pedestrians
across disjoint yet non-overlapping camera views. The most effective way to
match these pedestrians undertaking significant visual variations is to seek
reliably invariant features that can describe the person of interest
faithfully. Most of existing methods are presented in a supervised manner to
produce discriminative features by relying on labeled paired images in
correspondence. However, annotating pair-wise images is prohibitively expensive
in labors, and thus not practical in large-scale networked cameras. Moreover,
seeking comparable representations across camera views demands a flexible model
to address the complex distributions of images. In this work, we study the
co-occurrence statistic patterns between pairs of images, and propose to
crossing Generative Adversarial Network (Cross-GAN) for learning a joint
distribution for cross-image representations in a unsupervised manner. Given a
pair of person images, the proposed model consists of the variational
auto-encoder to encode the pair into respective latent variables, a proposed
cross-view alignment to reduce the view disparity, and an adversarial layer to
seek the joint distribution of latent representations. The learned latent
representations are well-aligned to reflect the co-occurrence patterns of
paired images. We empirically evaluate the proposed model against challenging
datasets, and our results show the importance of joint invariant features in
improving matching rates of person re-id with comparison to semi/unsupervised
state-of-the-arts.Comment: 12 pages. arXiv admin note: text overlap with arXiv:1702.03431 by
other author
- …