18,642 research outputs found
Multispectral Deep Neural Networks for Pedestrian Detection
Multispectral pedestrian detection is essential for around-the-clock
applications, e.g., surveillance and autonomous driving. We deeply analyze
Faster R-CNN for multispectral pedestrian detection task and then model it into
a convolutional network (ConvNet) fusion problem. Further, we discover that
ConvNet-based pedestrian detectors trained by color or thermal images
separately provide complementary information in discriminating human instances.
Thus there is a large potential to improve pedestrian detection by using color
and thermal images in DNNs simultaneously. We carefully design four ConvNet
fusion architectures that integrate two-branch ConvNets on different DNNs
stages, all of which yield better performance compared with the baseline
detector. Our experimental results on KAIST pedestrian benchmark show that the
Halfway Fusion model that performs fusion on the middle-level convolutional
features outperforms the baseline method by 11% and yields a missing rate 3.5%
lower than the other proposed architectures.Comment: 13 pages, 8 figures, BMVC 2016 ora
Unmanned Aerial Systems for Wildland and Forest Fires
Wildfires represent an important natural risk causing economic losses, human
death and important environmental damage. In recent years, we witness an
increase in fire intensity and frequency. Research has been conducted towards
the development of dedicated solutions for wildland and forest fire assistance
and fighting. Systems were proposed for the remote detection and tracking of
fires. These systems have shown improvements in the area of efficient data
collection and fire characterization within small scale environments. However,
wildfires cover large areas making some of the proposed ground-based systems
unsuitable for optimal coverage. To tackle this limitation, Unmanned Aerial
Systems (UAS) were proposed. UAS have proven to be useful due to their
maneuverability, allowing for the implementation of remote sensing, allocation
strategies and task planning. They can provide a low-cost alternative for the
prevention, detection and real-time support of firefighting. In this paper we
review previous work related to the use of UAS in wildfires. Onboard sensor
instruments, fire perception algorithms and coordination strategies are
considered. In addition, we present some of the recent frameworks proposing the
use of both aerial vehicles and Unmanned Ground Vehicles (UV) for a more
efficient wildland firefighting strategy at a larger scale.Comment: A recent published version of this paper is available at:
https://doi.org/10.3390/drones501001
Beyond Intra-modality: A Survey of Heterogeneous Person Re-identification
An efficient and effective person re-identification (ReID) system relieves
the users from painful and boring video watching and accelerates the process of
video analysis. Recently, with the explosive demands of practical applications,
a lot of research efforts have been dedicated to heterogeneous person
re-identification (Hetero-ReID). In this paper, we provide a comprehensive
review of state-of-the-art Hetero-ReID methods that address the challenge of
inter-modality discrepancies. According to the application scenario, we
classify the methods into four categories -- low-resolution, infrared, sketch,
and text. We begin with an introduction of ReID, and make a comparison between
Homogeneous ReID (Homo-ReID) and Hetero-ReID tasks. Then, we describe and
compare existing datasets for performing evaluations, and survey the models
that have been widely employed in Hetero-ReID. We also summarize and compare
the representative approaches from two perspectives, i.e., the application
scenario and the learning pipeline. We conclude by a discussion of some future
research directions. Follow-up updates are avaible at:
https://github.com/lightChaserX/Awesome-Hetero-reIDComment: Accepted by IJCAI 2020. Project url:
https://github.com/lightChaserX/Awesome-Hetero-reI
Prediction model of alcohol intoxication from facial temperature dynamics based on K-means clustering driven by evolutionary computing
Alcohol intoxication is a significant phenomenon, affecting many social areas, including work procedures or car driving. Alcohol causes certain side effects including changing the facial thermal distribution, which may enable the contactless identification and classification of alcohol-intoxicated people. We adopted a multiregional segmentation procedure to identify and classify symmetrical facial features, which reliably reflects the facial-temperature variations while subjects are drinking alcohol. Such a model can objectively track alcohol intoxication in the form of a facial temperature map. In our paper, we propose the segmentation model based on the clustering algorithm, which is driven by the modified version of the Artificial Bee Colony (ABC) evolutionary optimization with the goal of facial temperature features extraction from the IR (infrared radiation) images. This model allows for a definition of symmetric clusters, identifying facial temperature structures corresponding with intoxication. The ABC algorithm serves as an optimization process for an optimal cluster's distribution to the clustering method the best approximate individual areas linked with gradual alcohol intoxication. In our analysis, we analyzed a set of twenty volunteers, who had IR images taken to reflect the process of alcohol intoxication. The proposed method was represented by multiregional segmentation, allowing for classification of the individual spatial temperature areas into segmentation classes. The proposed method, besides single IR image modelling, allows for dynamical tracking of the alcohol-temperature features within a process of intoxication, from the sober state up to the maximum observed intoxication level.Web of Science118art. no. 99
Pedestrian Attribute Recognition: A Survey
Recognizing pedestrian attributes is an important task in computer vision
community due to it plays an important role in video surveillance. Many
algorithms has been proposed to handle this task. The goal of this paper is to
review existing works using traditional methods or based on deep learning
networks. Firstly, we introduce the background of pedestrian attributes
recognition (PAR, for short), including the fundamental concepts of pedestrian
attributes and corresponding challenges. Secondly, we introduce existing
benchmarks, including popular datasets and evaluation criterion. Thirdly, we
analyse the concept of multi-task learning and multi-label learning, and also
explain the relations between these two learning algorithms and pedestrian
attribute recognition. We also review some popular network architectures which
have widely applied in the deep learning community. Fourthly, we analyse
popular solutions for this task, such as attributes group, part-based,
\emph{etc}. Fifthly, we shown some applications which takes pedestrian
attributes into consideration and achieve better performance. Finally, we
summarized this paper and give several possible research directions for
pedestrian attributes recognition. The project page of this paper can be found
from the following website:
\url{https://sites.google.com/view/ahu-pedestrianattributes/}.Comment: Check our project page for High Resolution version of this survey:
https://sites.google.com/view/ahu-pedestrianattributes
Visual Prompt Multi-Modal Tracking
Visible-modal object tracking gives rise to a series of downstream
multi-modal tracking tributaries. To inherit the powerful representations of
the foundation model, a natural modus operandi for multi-modal tracking is full
fine-tuning on the RGB-based parameters. Albeit effective, this manner is not
optimal due to the scarcity of downstream data and poor transferability, etc.
In this paper, inspired by the recent success of the prompt learning in
language models, we develop Visual Prompt multi-modal Tracking (ViPT), which
learns the modal-relevant prompts to adapt the frozen pre-trained foundation
model to various downstream multimodal tracking tasks. ViPT finds a better way
to stimulate the knowledge of the RGB-based model that is pre-trained at scale,
meanwhile only introducing a few trainable parameters (less than 1% of model
parameters). ViPT outperforms the full fine-tuning paradigm on multiple
downstream tracking tasks including RGB+Depth, RGB+Thermal, and RGB+Event
tracking. Extensive experiments show the potential of visual prompt learning
for multi-modal tracking, and ViPT can achieve state-of-the-art performance
while satisfying parameter efficiency. Code and models are available at
https://github.com/jiawen-zhu/ViPT.Comment: Accepted by CVPR202
- …