12 research outputs found
Multi-Modal Mean-Fields via Cardinality-Based Clamping
Mean Field inference is central to statistical physics. It has attracted much
interest in the Computer Vision community to efficiently solve problems
expressible in terms of large Conditional Random Fields. However, since it
models the posterior probability distribution as a product of marginal
probabilities, it may fail to properly account for important dependencies
between variables. We therefore replace the fully factorized distribution of
Mean Field by a weighted mixture of such distributions, that similarly
minimizes the KL-Divergence to the true posterior. By introducing two new
ideas, namely, conditioning on groups of variables instead of single ones and
using a parameter of the conditional random field potentials, that we identify
to the temperature in the sense of statistical physics to select such groups,
we can perform this minimization efficiently. Our extension of the clamping
method proposed in previous works allows us to both produce a more descriptive
approximation of the true posterior and, inspired by the diverse MAP paradigms,
fit a mixture of Mean Field approximations. We demonstrate that this positively
impacts real-world algorithms that initially relied on mean fields.Comment: Submitted for review to CVPR 201
Fast heuristic method to detect people in frontal depth images
This paper presents a new method for detecting people using only depth images captured by a camera in a frontal position. The approach is based on first detecting all the objects present in the scene and determining their average depth (distance to the camera). Next, for each object, a 3D Region of Interest (ROI) is processed around it in order to determine if the characteristics of the object correspond to the biometric characteristics of a human head. The results obtained using three public datasets captured by three depth sensors with different spatial resolutions and different operation principle (structured light, active stereo vision and Time of Flight) are presented. These results demonstrate that our method can run in realtime using a low-cost CPU platform with a high accuracy, being the processing times smaller than 1 ms per frame for a 512 × 424 image resolution with a precision of 99.26% and smaller than 4 ms per frame for a 1280 × 720 image resolution with a precision of 99.77%
A novel video-vibration monitoring system for walking pattern identification on floors
This is the author accepted manuscript. The final version is available from the publisher via the DOI in this recordWalking-induced loads on office floors can generate unwanted vibrations. The current multiperson loading models are limited since they do not take into account nondeterministic factors
such as pacing rates, walking paths, obstacles in walking paths, busyness of floors, stride lengths,
and interactions among the occupants. This study proposes a novel video-vibration monitoring
system to investigate the complex human walking patterns on floors. The system is capable of
capturing occupant movements on the floor with cameras, and extracting walking trajectories
using image processing techniques. To demonstrate its capabilities, the system was installed on a
real office floor and resulting trajectories were statistically analyzed to identify the actual
walking patterns, paths, pacing rates, and busyness of the floor with respect to time. The
correlation between the vibration levels measured by the wireless sensors and the trajectories
extracted from the video recordings were also investigated. The results showed that the proposed
video-vibration monitoring system has strong potential to be used in training data-driven crowd
models, which can be used in future studies to generate realistic multi-person loading scenarios.Qatar National Research Foundatio
Towards dense people detection with deep learning and depth images
This paper describes a novel DNN-based system, named PD3net, that detects multiple people from a single depth image, in real time. The proposed neural network processes a depth image and outputs a likelihood map in image coordinates, where each detection corresponds to a Gaussian-shaped local distribution, centered at each person?s head. This likelihood map encodes both the number of detected people as well as their position in the image, from which the 3D position can be computed. The proposed DNN includes spatially separated convolutions to increase performance, and runs in real-time with low budget GPUs. We use synthetic data for initially training the network, followed by fine tuning with a small amount of real data. This allows adapting the network to different scenarios without needing large and manually labeled image datasets. Due to that, the people detection system presented in this paper has numerous potential applications in different fields, such as capacity control, automatic video-surveillance, people or groups behavior analysis, healthcare or monitoring and assistance of elderly people in ambient assisted living environments. In addition, the use of depth information does not allow recognizing the identity of people in the scene, thus enabling their detection while preserving their privacy. The proposed DNN has been experimentally evaluated and compared with other state-of-the-art approaches, including both classical and DNN-based solutions, under a wide range of experimental conditions. The achieved results allows concluding that the proposed architecture and the training strategy are effective, and the network generalize to work with scenes different from those used during training. We also demonstrate that our proposal outperforms existing methods and can accurately detect people in scenes with significant occlusions.Ministerio de Economía y CompetitividadUniversidad de AlcaláAgencia Estatal de Investigació
Labeling and evaluation of a new dataset for human action recognition in large vessels
El objetivo de este Trabajo de fin de Grado (TFG) es la generación, etiqueta y evaluación de un nuevo
dataset denominado Human Action Recognition on Ships (HARS) para el posterior entrenamiento y
evaluación de un sistema para la evacuación de personas en grandes embarcaciones, en el marco del
proyecto PALAEMON: A holistic passenger ship evacuation and rescue ecosystem (H2020-PALAEMON-
814962). Las secuencias a etiquetar incluyen diferentes personas realizando distintas actividades y han
sido grabadas en un barco disponible en Astilleros de Santander S.A.U. (ASTANDER). Para el etiquetado
se ha partido de una herramienta proporcionada por el grupo de investigacion GEINTRA, que ha sido
modificada para su adaptación a las necesidades de etiquetado del dataset, incluyendo no solo acciones
individuales, sino también grupales. Además, se han definido criterios para realizar el etiquetado de las
personas y acciones. La evaluación del dataset se ha llevado a cabo utilizando la red neuronal YOLOv3 y
realizando una evaluación de los resultados obtenidos en la detección de personas con dicha red a partir de
la información etiquetada. La implementación y ejecucion de YOLOv3 se ha realizado en Google Colab y
los resultados se han comparado con los etiquetados empleando MABLAB. El trabajo desarrollado y los
resultados obtenidos han permitido validar el etiquetado del dataset y el cumplimientod de los objetivos
del TFG.The aim of this Final Degree Thesis (TFG) is the generation, labeling and evaluation of a new dataset
named Human Action Recognition on Ships (HARS) for the later trainning and evaluation of a system
in charge of person evacuation in large cruise ships within the framework of PALAEMON: A holistic
passenger ship evacuation and rescue ecosystem project (H2020-PALAEMON-814962). The different
sequences to be labeled inside the dataset include different persons performing distinct activities and
have been recorded in a ship available at Astilleros de Santander S.A.U. (ASTANDER). The labeling has
been based on a tool provided by the GEINTRA research group and has been modified and adapted to
the labeling needs of the dataset including just not individual actions but also group actions. In addition,
criteria to perform the labeling process of persons an actions has been defined. The evaluation of the
dataset has been carried out using the neural network YOLOv3 and performing an evaluation of the
results obtained in person detection with this network from the labeled information. The implementation
and execution of YOLOv3 has been carried out in Google Colab and the results have been compared with
the labeled ones using MATLAB. The developed work and the obtained results have allowed to validate
the labeling of the dataset and the compliance of the objectives of the TFG.Grado en Ingeniería Electrónica de Comunicacione
Reconocimiento y seguimiento de personas mediante un sensor RGB-d en una plataforma robótica móvil
Este proyecto presenta un sistema de detección y seguimiento de personas desde un robot móvil. El objetivo es aplicar este tipo de sistemas en entornos públicos donde el robot busque a personas a las cuales acercarse para ofrecer algún tipo de servicio o información. Para la búsqueda de personas se ha diseñado un sistema de reconocimiento visual utilizando un sensor RGB-d. También se ha diseñado un sistema de seguimiento de esa persona seleccionada como objetivo, que consiste en acercarse a la misma. El reconocimiento se basa en las herramientas proporcionadas en las librerías de openCV. Este trabajo estudia distintas alternativas, y discute cuales se han elegido, porque y que cambios se han llevado a cabo en configuración. Después del reconocimiento de todas las personas que pueda haber en el campo de visión de nuestro robot, se debe elegir cuál de ellas se seguirá. Para ello se ha realizado un algoritmo para evaluar y ordenar las hipótesis detectadas. Por último, se seguirá a la persona seleccionada usando una plataforma robótica móvil a la que se le han proporcionado las coordenadas de la persona detectada en el punto anterior. El proceso de reconocimiento, elección y selección se repetirá hasta que el robot consigue acercarse a cierta persona, es decir, si la persona se está moviendo, el robot la seguirá. Para poder alcanzar el objetivo deseado en primer lugar ha sido necesario familiarizarse con el entorno de trabajo openCV sobre eclipse, sobre ROS, para lo cual se han usado diferentes tutoriales proporcionados por el sitio web oficial de openCV. A continuación, se han tenido que evaluar los distintos reconocedores que posee openCV y sus parámetros, para evaluar el coste y calidad de los resultados de las distintas opciones. Estos experimentos se han realizado utilizando tanto secuencias públicas utilizadas en trabajos relacionados, como datos propios capturados para realizar pruebas de seguimiento en el entorno y con el robot disponible para este proyecto. También se han realizado experimentos de integración para comprobar el funcionamiento en tiempo real de todo el sistema con la plataforma robótica móvil en distintos escenarios
Detection of abnormal passenger behaviors on ships, using RGBD cameras
El objetivo de este trabajo fin de Máster (TFM) es el diseño, implementación, y evaluación de un sistema inteligente de videovigilancia, que permita la detección, seguimiento y conteo de personas, así como la detección de estampidas, para grandes embarcaciones. El sistema desarrollado debe ser portable, y funcionar en tiempo real.
Para ello se ha realizado un estudio de las tecnologías disponibles en sistemas embebidos, para elegir las que mejor se adecúan al objetivo del TFM. Se ha desarrollado un sistema de detección de personas basado en una MobileNet-SSD, complementado con un banco de filtros de Kalman para el seguimiento.
Además, se ha incorporado un detector de estampidas basado en el análisis de la entropía del flujo óptico. Todo ello se ha implementado y evaluado en un dispositivo embebido que incluye una unidad VPU. Los resultados obtenidos han permitido validar la propuesta.The aim of this Final Master Thesis (TFM) is the design, implementation and evaluation of an intelligent video surveillance system that allows the detection, monitoring and counting of people, as well as the detection of stampedes, for large ships. The developed system must be portable and work in real time.
To this end, a study has been carried out of the technologies available in embedded systems, in order to
choose those that best suit the objective of the TFM. A people detection system based on a MobileNetSSD has been developed, complemented by a Kalman filter bank for monitoring. In addition, a stampede detector based on optical flow entropy analysis has been incorporated.
All this has been implemented and evaluated in an embedded device that includes a Vision Processing Unit (VPU) unit. The results obtained have allowed the validation of the proposal.Máster Universitario en Ingeniería de Telecomunicación (M125
Variational Methods for Human Modeling
A large part of computer vision research is devoted to building models
and algorithms aimed at understanding human appearance and behaviour
from images and videos. Ultimately, we want to build automated systems
that are at least as capable as people when it comes to
interpreting humans. Most of the tasks that we want these systems to
solve can be posed as a problem of inference in probabilistic
models. Although probabilistic inference in general is a very hard
problem of its own, there exists a very powerful class of inference
algorithms, variational inference, which allows us to build efficient
solutions for a wide range of problems.
In this thesis, we consider a variety of computer vision problems
targeted at modeling human appearance and behaviour, including
detection, activity recognition, semantic segmentation and facial
geometry modeling. For each of those problems, we develop novel methods
that use variational inference to improve the capabilities
of the existing systems.
First, we introduce a novel method for detecting multiple potentially
occluded people in depth images, which we call DPOM. Unlike many other
approaches, our method does probabilistic reasoning jointly,
and thus allows to propagate knowledge about one part of the image
evidence to reason about the rest. This is particularly
important in crowded scenes involving many people, since it helps to
handle ambiguous situations resulting from severe occlusions. We
demonstrate that our approach outperforms existing methods on multiple
datasets.
Second, we develop a new algorithm for variational inference that
works for a large class of probabilistic models, which includes, among
others, DPOM and some of the state-of-the-art models for semantic
segmentation. We provide a formal proof that our method converges,
and demonstrate experimentally that it brings better performance than
the state-of-the-art on several real-world tasks, which include
semantic segmentation and people detection. Importantly, we show that
parallel variational inference in discrete random fields can be seen
as a special case of proximal gradient descent, which allows us to
benefit from many of the advances in gradient-based optimization.
Third, we propose a unified framework for multi-human scene
understanding which simultaneously solves three tasks: multi-person
detection, individual action recognition and collective activity
recognition. Within our framework, we introduce a novel multi-person
detection scheme, which relies on variational inference and
jointly refines detection hypotheses instead of relying on
suboptimal post-processing. Ultimately, our model takes as an inputs a
frame sequence and produces a comprehensive description of the
scene. Finally, we experimentally demonstrate that our method brings
better performance than the state-of-the-art.
Fourth, we propose a new approach for learning facial geometry with
deep probabilistic models and variational methods. Our model is based
on a variational autoencoder with multiple sets of hidden variables,
which are capturing various levels of deformations, ranging from
global to local, high-frequency ones. We experimentally demonstrate
the power of the model on a variety of fitting tasks. Our model is
completely data-driven and can be learned from a relatively small
number of individuals