12 research outputs found

    Multi-Modal Mean-Fields via Cardinality-Based Clamping

    Get PDF
    Mean Field inference is central to statistical physics. It has attracted much interest in the Computer Vision community to efficiently solve problems expressible in terms of large Conditional Random Fields. However, since it models the posterior probability distribution as a product of marginal probabilities, it may fail to properly account for important dependencies between variables. We therefore replace the fully factorized distribution of Mean Field by a weighted mixture of such distributions, that similarly minimizes the KL-Divergence to the true posterior. By introducing two new ideas, namely, conditioning on groups of variables instead of single ones and using a parameter of the conditional random field potentials, that we identify to the temperature in the sense of statistical physics to select such groups, we can perform this minimization efficiently. Our extension of the clamping method proposed in previous works allows us to both produce a more descriptive approximation of the true posterior and, inspired by the diverse MAP paradigms, fit a mixture of Mean Field approximations. We demonstrate that this positively impacts real-world algorithms that initially relied on mean fields.Comment: Submitted for review to CVPR 201

    Fast heuristic method to detect people in frontal depth images

    Get PDF
    This paper presents a new method for detecting people using only depth images captured by a camera in a frontal position. The approach is based on first detecting all the objects present in the scene and determining their average depth (distance to the camera). Next, for each object, a 3D Region of Interest (ROI) is processed around it in order to determine if the characteristics of the object correspond to the biometric characteristics of a human head. The results obtained using three public datasets captured by three depth sensors with different spatial resolutions and different operation principle (structured light, active stereo vision and Time of Flight) are presented. These results demonstrate that our method can run in realtime using a low-cost CPU platform with a high accuracy, being the processing times smaller than 1 ms per frame for a 512 × 424 image resolution with a precision of 99.26% and smaller than 4 ms per frame for a 1280 × 720 image resolution with a precision of 99.77%

    A novel video-vibration monitoring system for walking pattern identification on floors

    Get PDF
    This is the author accepted manuscript. The final version is available from the publisher via the DOI in this recordWalking-induced loads on office floors can generate unwanted vibrations. The current multiperson loading models are limited since they do not take into account nondeterministic factors such as pacing rates, walking paths, obstacles in walking paths, busyness of floors, stride lengths, and interactions among the occupants. This study proposes a novel video-vibration monitoring system to investigate the complex human walking patterns on floors. The system is capable of capturing occupant movements on the floor with cameras, and extracting walking trajectories using image processing techniques. To demonstrate its capabilities, the system was installed on a real office floor and resulting trajectories were statistically analyzed to identify the actual walking patterns, paths, pacing rates, and busyness of the floor with respect to time. The correlation between the vibration levels measured by the wireless sensors and the trajectories extracted from the video recordings were also investigated. The results showed that the proposed video-vibration monitoring system has strong potential to be used in training data-driven crowd models, which can be used in future studies to generate realistic multi-person loading scenarios.Qatar National Research Foundatio

    Towards dense people detection with deep learning and depth images

    Get PDF
    This paper describes a novel DNN-based system, named PD3net, that detects multiple people from a single depth image, in real time. The proposed neural network processes a depth image and outputs a likelihood map in image coordinates, where each detection corresponds to a Gaussian-shaped local distribution, centered at each person?s head. This likelihood map encodes both the number of detected people as well as their position in the image, from which the 3D position can be computed. The proposed DNN includes spatially separated convolutions to increase performance, and runs in real-time with low budget GPUs. We use synthetic data for initially training the network, followed by fine tuning with a small amount of real data. This allows adapting the network to different scenarios without needing large and manually labeled image datasets. Due to that, the people detection system presented in this paper has numerous potential applications in different fields, such as capacity control, automatic video-surveillance, people or groups behavior analysis, healthcare or monitoring and assistance of elderly people in ambient assisted living environments. In addition, the use of depth information does not allow recognizing the identity of people in the scene, thus enabling their detection while preserving their privacy. The proposed DNN has been experimentally evaluated and compared with other state-of-the-art approaches, including both classical and DNN-based solutions, under a wide range of experimental conditions. The achieved results allows concluding that the proposed architecture and the training strategy are effective, and the network generalize to work with scenes different from those used during training. We also demonstrate that our proposal outperforms existing methods and can accurately detect people in scenes with significant occlusions.Ministerio de Economía y CompetitividadUniversidad de AlcaláAgencia Estatal de Investigació

    Labeling and evaluation of a new dataset for human action recognition in large vessels

    Get PDF
    El objetivo de este Trabajo de fin de Grado (TFG) es la generación, etiqueta y evaluación de un nuevo dataset denominado Human Action Recognition on Ships (HARS) para el posterior entrenamiento y evaluación de un sistema para la evacuación de personas en grandes embarcaciones, en el marco del proyecto PALAEMON: A holistic passenger ship evacuation and rescue ecosystem (H2020-PALAEMON- 814962). Las secuencias a etiquetar incluyen diferentes personas realizando distintas actividades y han sido grabadas en un barco disponible en Astilleros de Santander S.A.U. (ASTANDER). Para el etiquetado se ha partido de una herramienta proporcionada por el grupo de investigacion GEINTRA, que ha sido modificada para su adaptación a las necesidades de etiquetado del dataset, incluyendo no solo acciones individuales, sino también grupales. Además, se han definido criterios para realizar el etiquetado de las personas y acciones. La evaluación del dataset se ha llevado a cabo utilizando la red neuronal YOLOv3 y realizando una evaluación de los resultados obtenidos en la detección de personas con dicha red a partir de la información etiquetada. La implementación y ejecucion de YOLOv3 se ha realizado en Google Colab y los resultados se han comparado con los etiquetados empleando MABLAB. El trabajo desarrollado y los resultados obtenidos han permitido validar el etiquetado del dataset y el cumplimientod de los objetivos del TFG.The aim of this Final Degree Thesis (TFG) is the generation, labeling and evaluation of a new dataset named Human Action Recognition on Ships (HARS) for the later trainning and evaluation of a system in charge of person evacuation in large cruise ships within the framework of PALAEMON: A holistic passenger ship evacuation and rescue ecosystem project (H2020-PALAEMON-814962). The different sequences to be labeled inside the dataset include different persons performing distinct activities and have been recorded in a ship available at Astilleros de Santander S.A.U. (ASTANDER). The labeling has been based on a tool provided by the GEINTRA research group and has been modified and adapted to the labeling needs of the dataset including just not individual actions but also group actions. In addition, criteria to perform the labeling process of persons an actions has been defined. The evaluation of the dataset has been carried out using the neural network YOLOv3 and performing an evaluation of the results obtained in person detection with this network from the labeled information. The implementation and execution of YOLOv3 has been carried out in Google Colab and the results have been compared with the labeled ones using MATLAB. The developed work and the obtained results have allowed to validate the labeling of the dataset and the compliance of the objectives of the TFG.Grado en Ingeniería Electrónica de Comunicacione

    Reconocimiento y seguimiento de personas mediante un sensor RGB-d en una plataforma robótica móvil

    Get PDF
    Este proyecto presenta un sistema de detección y seguimiento de personas desde un robot móvil. El objetivo es aplicar este tipo de sistemas en entornos públicos donde el robot busque a personas a las cuales acercarse para ofrecer algún tipo de servicio o información. Para la búsqueda de personas se ha diseñado un sistema de reconocimiento visual utilizando un sensor RGB-d. También se ha diseñado un sistema de seguimiento de esa persona seleccionada como objetivo, que consiste en acercarse a la misma. El reconocimiento se basa en las herramientas proporcionadas en las librerías de openCV. Este trabajo estudia distintas alternativas, y discute cuales se han elegido, porque y que cambios se han llevado a cabo en configuración. Después del reconocimiento de todas las personas que pueda haber en el campo de visión de nuestro robot, se debe elegir cuál de ellas se seguirá. Para ello se ha realizado un algoritmo para evaluar y ordenar las hipótesis detectadas. Por último, se seguirá a la persona seleccionada usando una plataforma robótica móvil a la que se le han proporcionado las coordenadas de la persona detectada en el punto anterior. El proceso de reconocimiento, elección y selección se repetirá hasta que el robot consigue acercarse a cierta persona, es decir, si la persona se está moviendo, el robot la seguirá. Para poder alcanzar el objetivo deseado en primer lugar ha sido necesario familiarizarse con el entorno de trabajo openCV sobre eclipse, sobre ROS, para lo cual se han usado diferentes tutoriales proporcionados por el sitio web oficial de openCV. A continuación, se han tenido que evaluar los distintos reconocedores que posee openCV y sus parámetros, para evaluar el coste y calidad de los resultados de las distintas opciones. Estos experimentos se han realizado utilizando tanto secuencias públicas utilizadas en trabajos relacionados, como datos propios capturados para realizar pruebas de seguimiento en el entorno y con el robot disponible para este proyecto. También se han realizado experimentos de integración para comprobar el funcionamiento en tiempo real de todo el sistema con la plataforma robótica móvil en distintos escenarios

    Detection of abnormal passenger behaviors on ships, using RGBD cameras

    Get PDF
    El objetivo de este trabajo fin de Máster (TFM) es el diseño, implementación, y evaluación de un sistema inteligente de videovigilancia, que permita la detección, seguimiento y conteo de personas, así como la detección de estampidas, para grandes embarcaciones. El sistema desarrollado debe ser portable, y funcionar en tiempo real. Para ello se ha realizado un estudio de las tecnologías disponibles en sistemas embebidos, para elegir las que mejor se adecúan al objetivo del TFM. Se ha desarrollado un sistema de detección de personas basado en una MobileNet-SSD, complementado con un banco de filtros de Kalman para el seguimiento. Además, se ha incorporado un detector de estampidas basado en el análisis de la entropía del flujo óptico. Todo ello se ha implementado y evaluado en un dispositivo embebido que incluye una unidad VPU. Los resultados obtenidos han permitido validar la propuesta.The aim of this Final Master Thesis (TFM) is the design, implementation and evaluation of an intelligent video surveillance system that allows the detection, monitoring and counting of people, as well as the detection of stampedes, for large ships. The developed system must be portable and work in real time. To this end, a study has been carried out of the technologies available in embedded systems, in order to choose those that best suit the objective of the TFM. A people detection system based on a MobileNetSSD has been developed, complemented by a Kalman filter bank for monitoring. In addition, a stampede detector based on optical flow entropy analysis has been incorporated. All this has been implemented and evaluated in an embedded device that includes a Vision Processing Unit (VPU) unit. The results obtained have allowed the validation of the proposal.Máster Universitario en Ingeniería de Telecomunicación (M125

    Variational Methods for Human Modeling

    Get PDF
    A large part of computer vision research is devoted to building models and algorithms aimed at understanding human appearance and behaviour from images and videos. Ultimately, we want to build automated systems that are at least as capable as people when it comes to interpreting humans. Most of the tasks that we want these systems to solve can be posed as a problem of inference in probabilistic models. Although probabilistic inference in general is a very hard problem of its own, there exists a very powerful class of inference algorithms, variational inference, which allows us to build efficient solutions for a wide range of problems. In this thesis, we consider a variety of computer vision problems targeted at modeling human appearance and behaviour, including detection, activity recognition, semantic segmentation and facial geometry modeling. For each of those problems, we develop novel methods that use variational inference to improve the capabilities of the existing systems. First, we introduce a novel method for detecting multiple potentially occluded people in depth images, which we call DPOM. Unlike many other approaches, our method does probabilistic reasoning jointly, and thus allows to propagate knowledge about one part of the image evidence to reason about the rest. This is particularly important in crowded scenes involving many people, since it helps to handle ambiguous situations resulting from severe occlusions. We demonstrate that our approach outperforms existing methods on multiple datasets. Second, we develop a new algorithm for variational inference that works for a large class of probabilistic models, which includes, among others, DPOM and some of the state-of-the-art models for semantic segmentation. We provide a formal proof that our method converges, and demonstrate experimentally that it brings better performance than the state-of-the-art on several real-world tasks, which include semantic segmentation and people detection. Importantly, we show that parallel variational inference in discrete random fields can be seen as a special case of proximal gradient descent, which allows us to benefit from many of the advances in gradient-based optimization. Third, we propose a unified framework for multi-human scene understanding which simultaneously solves three tasks: multi-person detection, individual action recognition and collective activity recognition. Within our framework, we introduce a novel multi-person detection scheme, which relies on variational inference and jointly refines detection hypotheses instead of relying on suboptimal post-processing. Ultimately, our model takes as an inputs a frame sequence and produces a comprehensive description of the scene. Finally, we experimentally demonstrate that our method brings better performance than the state-of-the-art. Fourth, we propose a new approach for learning facial geometry with deep probabilistic models and variational methods. Our model is based on a variational autoencoder with multiple sets of hidden variables, which are capturing various levels of deformations, ranging from global to local, high-frequency ones. We experimentally demonstrate the power of the model on a variety of fitting tasks. Our model is completely data-driven and can be learned from a relatively small number of individuals
    corecore