9,614 research outputs found
Person Search with Natural Language Description
Searching persons in large-scale image databases with the query of natural
language description has important applications in video surveillance. Existing
methods mainly focused on searching persons with image-based or attribute-based
queries, which have major limitations for a practical usage. In this paper, we
study the problem of person search with natural language description. Given the
textual description of a person, the algorithm of the person search is required
to rank all the samples in the person database then retrieve the most relevant
sample corresponding to the queried description. Since there is no person
dataset or benchmark with textual description available, we collect a
large-scale person description dataset with detailed natural language
annotations and person samples from various sources, termed as CUHK Person
Description Dataset (CUHK-PEDES). A wide range of possible models and baselines
have been evaluated and compared on the person search benchmark. An Recurrent
Neural Network with Gated Neural Attention mechanism (GNA-RNN) is proposed to
establish the state-of-the art performance on person search
Machine Understanding of Human Behavior
A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. If this prediction is to come true, then next generation computing, which we will call human computing, should be about anticipatory user interfaces that should be human-centered, built for humans based on human models. They should transcend the traditional keyboard and mouse to include natural, human-like interactive functions including understanding and emulating certain human behaviors such as affective and social signaling. This article discusses a number of components of human behavior, how they might be integrated into computers, and how far we are from realizing the front end of human computing, that is, how far are we from enabling computers to understand human behavior
A mask-based approach for the geometric calibration of thermal-infrared cameras
Accurate and efficient thermal-infrared (IR) camera calibration is important for advancing computer vision research within the thermal modality. This paper presents an approach for geometrically calibrating individual and multiple cameras in both the thermal and visible modalities. The proposed technique can be used to correct for lens distortion and to simultaneously reference both visible and thermal-IR cameras to a single coordinate frame. The most popular existing approach for the geometric calibration of thermal cameras uses a printed chessboard heated by a flood lamp and is comparatively inaccurate and difficult to execute. Additionally, software toolkits provided for calibration either are unsuitable for this task or require substantial manual intervention. A new geometric mask with high thermal contrast and not requiring a flood lamp is presented as an alternative calibration pattern. Calibration points on the pattern are then accurately located using a clustering-based algorithm which utilizes the maximally stable extremal region detector. This algorithm is integrated into an automatic end-to-end system for calibrating single or multiple cameras. The evaluation shows that using the proposed mask achieves a mean reprojection error up to 78% lower than that using a heated chessboard. The effectiveness of the approach is further demonstrated by using it to calibrate two multiple-camera multiple-modality setups. Source code and binaries for the developed software are provided on the project Web site
Aerial Vehicle Tracking by Adaptive Fusion of Hyperspectral Likelihood Maps
Hyperspectral cameras can provide unique spectral signatures for consistently
distinguishing materials that can be used to solve surveillance tasks. In this
paper, we propose a novel real-time hyperspectral likelihood maps-aided
tracking method (HLT) inspired by an adaptive hyperspectral sensor. A moving
object tracking system generally consists of registration, object detection,
and tracking modules. We focus on the target detection part and remove the
necessity to build any offline classifiers and tune a large amount of
hyperparameters, instead learning a generative target model in an online manner
for hyperspectral channels ranging from visible to infrared wavelengths. The
key idea is that, our adaptive fusion method can combine likelihood maps from
multiple bands of hyperspectral imagery into one single more distinctive
representation increasing the margin between mean value of foreground and
background pixels in the fused map. Experimental results show that the HLT not
only outperforms all established fusion methods but is on par with the current
state-of-the-art hyperspectral target tracking frameworks.Comment: Accepted at the International Conference on Computer Vision and
Pattern Recognition Workshops, 201
- …