1,047 research outputs found
Search Tracker: Human-derived object tracking in-the-wild through large-scale search and retrieval
Humans use context and scene knowledge to easily localize moving objects in
conditions of complex illumination changes, scene clutter and occlusions. In
this paper, we present a method to leverage human knowledge in the form of
annotated video libraries in a novel search and retrieval based setting to
track objects in unseen video sequences. For every video sequence, a document
that represents motion information is generated. Documents of the unseen video
are queried against the library at multiple scales to find videos with similar
motion characteristics. This provides us with coarse localization of objects in
the unseen video. We further adapt these retrieved object locations to the new
video using an efficient warping scheme. The proposed method is validated on
in-the-wild video surveillance datasets where we outperform state-of-the-art
appearance-based trackers. We also introduce a new challenging dataset with
complex object appearance changes.Comment: Under review with the IEEE Transactions on Circuits and Systems for
Video Technolog
Exploring Human Vision Driven Features for Pedestrian Detection
Motivated by the center-surround mechanism in the human visual attention
system, we propose to use average contrast maps for the challenge of pedestrian
detection in street scenes due to the observation that pedestrians indeed
exhibit discriminative contrast texture. Our main contributions are first to
design a local, statistical multi-channel descriptorin order to incorporate
both color and gradient information. Second, we introduce a multi-direction and
multi-scale contrast scheme based on grid-cells in order to integrate
expressive local variations. Contributing to the issue of selecting most
discriminative features for assessing and classification, we perform extensive
comparisons w.r.t. statistical descriptors, contrast measurements, and scale
structures. This way, we obtain reasonable results under various
configurations. Empirical findings from applying our optimized detector on the
INRIA and Caltech pedestrian datasets show that our features yield
state-of-the-art performance in pedestrian detection.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems
for Video Technology (TCSVT
Automatic object classification for surveillance videos.
PhDThe recent popularity of surveillance video systems, specially located in urban
scenarios, demands the development of visual techniques for monitoring purposes.
A primary step towards intelligent surveillance video systems consists on automatic
object classification, which still remains an open research problem and the keystone
for the development of more specific applications.
Typically, object representation is based on the inherent visual features. However,
psychological studies have demonstrated that human beings can routinely categorise
objects according to their behaviour. The existing gap in the understanding
between the features automatically extracted by a computer, such as appearance-based
features, and the concepts unconsciously perceived by human beings but
unattainable for machines, or the behaviour features, is most commonly known
as semantic gap. Consequently, this thesis proposes to narrow the semantic gap
and bring together machine and human understanding towards object classification.
Thus, a Surveillance Media Management is proposed to automatically detect and
classify objects by analysing the physical properties inherent in their appearance
(machine understanding) and the behaviour patterns which require a higher level of
understanding (human understanding). Finally, a probabilistic multimodal fusion
algorithm bridges the gap performing an automatic classification considering both
machine and human understanding.
The performance of the proposed Surveillance Media Management framework
has been thoroughly evaluated on outdoor surveillance datasets. The experiments
conducted demonstrated that the combination of machine and human understanding
substantially enhanced the object classification performance. Finally, the inclusion
of human reasoning and understanding provides the essential information to bridge
the semantic gap towards smart surveillance video systems
Biologically-inspired robust motion segmentation using mutual information
This paper presents a neuroscience inspired information theoretic approach to motion segmentation. Robust motion segmentation represents a fundamental first stage in many surveillance tasks. As an alternative to widely adopted individual segmentation approaches, which are challenged in different ways by imagery exhibiting a wide range of environmental variation and irrelevant motion, this paper presents a new biologically-inspired approach which computes the multivariate mutual information between multiple complementary motion segmentation outputs. Performance evaluation across a range of datasets and against competing segmentation methods demonstrates robust performance
An integrated model of visual attention using shape-based features
Apart from helping shed some light on human perceptual mechanisms, modeling visual attention has important applications in computer vision. It has been shown to be useful in priming object detection, pruning interest points, quantifying visual clutter as well as predicting human eye movements. Prior work has either relied on purely bottom-up approaches or top-down schemes using simple low-level features. In this paper, we outline a top-down visual attention model based on shape-based features. The same shape-based representation is used to represent both the objects and the scenes that contain them. The spatial priors imposed by the scene and the feature priors imposed by the target object are combined in a Bayesian framework to generate a task-dependent saliency map. We show that our approach can predict the location of objects as well as match eye movements (92% overlap with human observers). We also show that the proposed approach performs better than existing bottom-up and top-down computational models
画像からの顕著領域と人物の検出に関する研究
九州工業大学博士学位論文 学位記番号:工博甲第365号 学位授与年月日:平成26年3月25日1. General Introduction|2. Saliency detection using combined spatial non-redundancy and local appearance|3. Human detection using LBP-based patterns of oriented edges|4. General Conclusion|5. Appendices九州工業大学平成25年
Human pose and action recognition
This thesis focuses on detection of persons and pose recognition using neural networks.
The goal is to detect human body poses in a visual scene with multiple
persons and to use this information in order to recognize human activity. This is
achieved by rst detecting persons in a scene and then by estimating their body
joints in order to infer articulated poses.
The work developed in this thesis explored neural networks and deep learning
methods. Deep learning allows to employ computational models that are composed
of multiple processing layers to learn representations of data with multiple levels
of abstraction. These methods have greatly improved the state-of-the-art in many
domains such as speech recognition and visual object detection and classi cation.
Deep learning discovers intricate structure in data by using the backpropagation
algorithm to indicate how a machine should change its internal parameters that are
used to compute the representation in each layer from the representation provided
by the previous one.
Person detection, in general, is a di cult task due to a large variability of representation
due to di erent factors such as scales, views and occlusion. An object
detection framework based on multi-stage convolutional features for pedestrian detection
is proposed in this thesis. This framework extends the Fast R-CNN framework
for the combination of several convolutional features from di erent stages of
a CNN (Convolutional Neural Network) to improve the detector's accuracy. This
provides high quality detections of persons in a visual scene, which are then used
as input in conjunction with a human pose estimation model in order to estimate
human body joint locations of multiple persons in an image.
Human pose estimation is done by a deep convolutional neural network composed
of a series of residual auto-encoders. These produce multiple predictions which are
later combined to provide a heatmap prediction of human body joints. In this network
topology, features are processed across all scales capturing the various spatial
relationships associated with the body. Repeated bottom-up and top-down processing
with intermediate supervision for each auto-encoder network is applied. This
results in very accurate 2D heatmaps of body joint predictions.
The methods presented in this thesis were benchmarked against other topperforming
methods on popular datasets for human pedestrian and pose estimation,
achieving good results compared with other state-of-the-art algorithms.Esta tese foca a detec c~ao de pessoas e o reconhecimento de poses usando redes neuronais.
O objectivo e detectar poses humanas num ambiente (cena) com m ultiplas
pessoas e usar essa informa c~ao para reconhecer actividade humana. Isto e alcan cado
ao detectar, em primeiro lugar, pessoas numa cena e, seguidamente, estimar as suas
juntas corporais de modo a inferir poses articuladas.
O trabalho desenvolvido nesta tese explorou m etodos de redes neuronais e de
aprendizagem profunda. A aprendizagem profunda permite que modelos computacionais
compostos por m ultiplas camadas de processamento aprendam representa
c~oes de dados com m ultiplos n veis de abstra c~ao. Estes m etodos t^em drasticamente
melhorado o estado-da-arte em muitos dom nios como o reconhecimento
de fala e a classi ca c~ao e o reconhecimento de objectos visuais. A aprendizagem
profunda descobre estruturas intr nsecas em conjuntos de dados ao usar algoritmos
de propaga c~ao inversa (backpropagation) para indicar como uma m aquina deve alterar
os seus par^ametros internos que, por sua vez, s~ao usados para processar a
representa c~ao em cada camada a partir da representa c~ao da camada anterior.
A detec c~ao de pessoas em geral e uma tarefa dif cil dado a grande variabilidade de
representa c~oes devido a diferentes escalas, vistas e oclus~oes. Uma estrutura de detec
c~ao de objectos baseada em caracter sticas convolucionais de m ultiplos est agios
para a detec c~ao de pedestres e proposta nesta tese. Esta estrutura estende a estrutura
Fast R-CNN com a combina c~ao de v arias caracter sticas convolucionais de
diferentes est agios da CNN (Convolutional Neural Network) usada de modo a melhorar
a precis~ao do detector. Isto proporciona detec c~oes de pessoas com elevada
abilidade numa cena, que s~ao posteriormente conjuntamente usadas como entrada
no modelo de estima c~ao de poses humanas de modo a estimar a localiza c~ao de
articula c~oes humanas para a detec c~ao de m ultiplas pessoas numa imagem.
A estima c~ao de poses humanas e obtido atrav es de redes neuronais convolucionais
profundas que s~ao compostas por uma s erie de auto-codi cadores residuais que
fornecem m ultiplas previs~oes que s~ao, posteriormente, combinadas para fornecer
um \mapa de calor" de articula c~oes corporais. Nesta topologia de rede, as caracter
sticas da imagem s~ao processadas ao longo de v arias escalas, capturando as
v arias rela c~oes espaciais associadas com o corpo humano. Repetidos processos de
baixo-para-cima e de cima-para-baixo com supervis~ao interm edia para cada autocodi
cador s~ao aplicados. Isto resulta em mapas de calor 2D muito precisos de
estima c~oes de articula c~oes corporais de pessoas.
Os m etodos apresentados nesta tese foram comparados com outros m etodos de
alto desempenho em bases de dados de detec c~ao de pessoas e de reconhecimento de
poses humanas, alcan cando muito bons resultados comparando com outros algoritmos
do estado-da-arte
Visual Clutter Study for Pedestrian Using Large Scale Naturalistic Driving Data
Some of the pedestrian crashes are due to driver’s late or difficult perception of pedestrian’s appearance. Recognition of pedestrians during driving is a complex cognitive activity. Visual clutter analysis can be used to study the factors that affect human visual search efficiency and help design advanced driver assistant system for better decision making and user experience. In this thesis, we propose the pedestrian perception evaluation model which can quantitatively analyze the pedestrian perception difficulty using naturalistic driving data. An efficient detection framework was developed to locate pedestrians within large scale naturalistic driving data. Visual clutter analysis was used to study the factors that may affect the driver’s ability to perceive pedestrian appearance. The candidate factors were explored by the designed exploratory study using naturalistic driving data and a bottom-up image-based pedestrian clutter metric was proposed to quantify the pedestrian perception difficulty in naturalistic driving data. Based on the proposed bottom-up clutter metrics and top-down pedestrian appearance based estimator, a Bayesian probabilistic pedestrian perception evaluation model was further constructed to simulate the pedestrian perception process
- …